Getting started with Snakemake
Objective 6
ex1_o6.smk
in which fastqc runs on each input individually.Where to start?
Our culprit: the expand()
function in the fastqc rule provides a list of files which are then interpreted as a combined input rather than individual files.
- change the fastqc rule: remove the
expand()
in the input and output of the fastqc rule but keep the wildcard-containing strings
Your code for ex1_o6.smk
should look like this:
SAMPLES=["SRR3099585_chr18","SRR3099586_chr18","SRR3099587_chr18"] rule all: input: expand("FastQC/{sample}_fastqc.html", sample=SAMPLES),
expand("FastQC/{sample}_fastqc.zip", sample=SAMPLES), "multiqc_report.html",
"multiqc_data",
rule fastqc:
input:
"Data/{sample}.fastq.gz"
output:
"FastQC/{sample}_fastqc.zip",
"FastQC/{sample}_fastqc.html"
shell: "fastqc --outdir FastQC {input}"
rule multiqc: input: expand("FastQC/{sample}_fastqc.zip", sample = SAMPLES) output: "multiqc_report.html", directory("multiqc_data") shell: "multiqc {input}"
Explanation: we used wildcards to “generalise” the input and output of the fastqc rule. You can see Data/{sample}.fastq.gz
, FastQC/{sample}_fastqc.zip
and FastQC/{sample}_fastqc.html
as “templates” (with “{sample}
” being the only variable part) for input and output file names for the fastqc rule.
Of note, {sample}
doesn’t have to match the wildcard name given in the previous expand functions. In theory, we could have used any other wildcard name as long as input and output directives of a same rule match (e.g. {mysample}
instead of {sample}
).
Test the script
Next, let’s check again if your pipeline works:
snakemake -s ex1_o6.smk --cores 1 -p
You should see something similar to the following output on your screen.
Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count
------- -------
all 1
fastqc 3
multiqc 1
total 5
Select jobs to execute...
Execute 1 jobs...
[Tue Feb 20 14:38:24 2024]
localrule fastqc:
input: Data/SRR3099585_chr18.fastq.gz
output: FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.html
jobid: 1
reason: Set of input files has changed since last execution
wildcards: sample=SRR3099585_chr18
resources: tmpdir=/var/tmp/pbs.743371.pbsserver
fastqc --outdir FastQC Data/SRR3099585_chr18.fastq.gz
Started analysis of SRR3099585_chr18.fastq.gz
Approx 5% complete for SRR3099585_chr18.fastq.gz
Approx 10% complete for SRR3099585_chr18.fastq.gz
Approx 15% complete for SRR3099585_chr18.fastq.gz
Approx 20% complete for SRR3099585_chr18.fastq.gz
Approx 25% complete for SRR3099585_chr18.fastq.gz
Approx 30% complete for SRR3099585_chr18.fastq.gz
Approx 35% complete for SRR3099585_chr18.fastq.gz
Approx 40% complete for SRR3099585_chr18.fastq.gz
Approx 45% complete for SRR3099585_chr18.fastq.gz
Approx 50% complete for SRR3099585_chr18.fastq.gz
Approx 55% complete for SRR3099585_chr18.fastq.gz
Approx 60% complete for SRR3099585_chr18.fastq.gz
Approx 65% complete for SRR3099585_chr18.fastq.gz
Approx 70% complete for SRR3099585_chr18.fastq.gz
Approx 75% complete for SRR3099585_chr18.fastq.gz
Approx 80% complete for SRR3099585_chr18.fastq.gz
Approx 85% complete for SRR3099585_chr18.fastq.gz
Approx 90% complete for SRR3099585_chr18.fastq.gz
Approx 95% complete for SRR3099585_chr18.fastq.gz
Analysis complete for SRR3099585_chr18.fastq.gz
[Tue Feb 20 14:38:31 2024]
Finished job 1.
1 of 5 steps (20%) done
Select jobs to execute...
Execute 1 jobs...
[Tue Feb 20 14:38:31 2024]
localrule fastqc:
input: Data/SRR3099587_chr18.fastq.gz
output: FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.html
jobid: 3
reason: Set of input files has changed since last execution
wildcards: sample=SRR3099587_chr18
resources: tmpdir=/var/tmp/pbs.743371.pbsserver
fastqc --outdir FastQC Data/SRR3099587_chr18.fastq.gz
Started analysis of SRR3099587_chr18.fastq.gz
Approx 5% complete for SRR3099587_chr18.fastq.gz
Approx 10% complete for SRR3099587_chr18.fastq.gz
Approx 15% complete for SRR3099587_chr18.fastq.gz
Approx 20% complete for SRR3099587_chr18.fastq.gz
Approx 25% complete for SRR3099587_chr18.fastq.gz
Approx 30% complete for SRR3099587_chr18.fastq.gz
Approx 35% complete for SRR3099587_chr18.fastq.gz
Approx 40% complete for SRR3099587_chr18.fastq.gz
Approx 45% complete for SRR3099587_chr18.fastq.gz
Approx 50% complete for SRR3099587_chr18.fastq.gz
Approx 55% complete for SRR3099587_chr18.fastq.gz
Approx 60% complete for SRR3099587_chr18.fastq.gz
Approx 65% complete for SRR3099587_chr18.fastq.gz
Approx 70% complete for SRR3099587_chr18.fastq.gz
Approx 75% complete for SRR3099587_chr18.fastq.gz
Approx 80% complete for SRR3099587_chr18.fastq.gz
Approx 85% complete for SRR3099587_chr18.fastq.gz
Approx 90% complete for SRR3099587_chr18.fastq.gz
Approx 95% complete for SRR3099587_chr18.fastq.gz
Analysis complete for SRR3099587_chr18.fastq.gz
[Tue Feb 20 14:38:39 2024]
Finished job 3.
2 of 5 steps (40%) done
Select jobs to execute...
Execute 1 jobs...
[Tue Feb 20 14:38:39 2024]
localrule fastqc:
input: Data/SRR3099586_chr18.fastq.gz
output: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.html
jobid: 2
reason: Set of input files has changed since last execution
wildcards: sample=SRR3099586_chr18
resources: tmpdir=/var/tmp/pbs.743371.pbsserver
fastqc --outdir FastQC Data/SRR3099586_chr18.fastq.gz
Started analysis of SRR3099586_chr18.fastq.gz
Approx 5% complete for SRR3099586_chr18.fastq.gz
Approx 10% complete for SRR3099586_chr18.fastq.gz
Approx 15% complete for SRR3099586_chr18.fastq.gz
Approx 20% complete for SRR3099586_chr18.fastq.gz
Approx 25% complete for SRR3099586_chr18.fastq.gz
Approx 30% complete for SRR3099586_chr18.fastq.gz
Approx 35% complete for SRR3099586_chr18.fastq.gz
Approx 40% complete for SRR3099586_chr18.fastq.gz
Approx 45% complete for SRR3099586_chr18.fastq.gz
Approx 50% complete for SRR3099586_chr18.fastq.gz
Approx 55% complete for SRR3099586_chr18.fastq.gz
Approx 60% complete for SRR3099586_chr18.fastq.gz
Approx 65% complete for SRR3099586_chr18.fastq.gz
Approx 70% complete for SRR3099586_chr18.fastq.gz
Approx 75% complete for SRR3099586_chr18.fastq.gz
Approx 80% complete for SRR3099586_chr18.fastq.gz
Approx 85% complete for SRR3099586_chr18.fastq.gz
Approx 90% complete for SRR3099586_chr18.fastq.gz
Approx 95% complete for SRR3099586_chr18.fastq.gz
Analysis complete for SRR3099586_chr18.fastq.gz
[Tue Feb 20 14:38:46 2024]
Finished job 2.
3 of 5 steps (60%) done
Select jobs to execute...
Execute 1 jobs...
[Tue Feb 20 14:38:46 2024]
localrule multiqc:
input: FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.zip
output: multiqc_report.html, multiqc_data
jobid: 4
reason: Input files updated by another job: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.zip
resources: tmpdir=/var/tmp/pbs.743371.pbsserver
multiqc FastQC/SRR3099585_chr18_fastqc.zip FastQC/SRR3099586_chr18_fastqc.zip FastQC/SRR3099587_chr18_fastqc.zip
[WARNING] multiqc : MultiQC Version v1.20 now available!
[INFO ] multiqc : This is MultiQC v1.9
[INFO ] multiqc : Template : default
[INFO ] multiqc : Searching : /data/work/I2BC/chloe.quignot/snakemake_tutorial/FastQC/SRR3099585_chr18_fastqc.zip
[INFO ] multiqc : Searching : /data/work/I2BC/chloe.quignot/snakemake_tutorial/FastQC/SRR3099586_chr18_fastqc.zip
[INFO ] multiqc : Searching : /data/work/I2BC/chloe.quignot/snakemake_tutorial/FastQC/SRR3099587_chr18_fastqc.zip
[INFO ] fastqc : Found 3 reports
[INFO ] multiqc : Compressing plot data
[INFO ] multiqc : Report : multiqc_report.html
[INFO ] multiqc : Data : multiqc_data
[INFO ] multiqc : MultiQC complete
[Tue Feb 20 14:38:51 2024]
Finished job 4.
4 of 5 steps (80%) done
Select jobs to execute...
Execute 1 jobs...
[Tue Feb 20 14:38:51 2024]
localrule all:
input: FastQC/SRR3099585_chr18_fastqc.html, FastQC/SRR3099586_chr18_fastqc.html, FastQC/SRR3099587_chr18_fastqc.html, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3
099587_chr18_fastqc.zip, multiqc_report.html, multiqc_data
jobid: 0
reason: Input files updated by another job: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.html, multiqc_data, FastQC/SRR3099586_chr18_fastqc.html, FastQC/SRR3099587_chr18_fastqc.zi
p, FastQC/SRR3099585_chr18_fastqc.zip, multiqc_report.html, FastQC/SRR3099587_chr18_fastqc.html
resources: tmpdir=/var/tmp/pbs.743371.pbsserver
[Tue Feb 20 14:38:51 2024]
Finished job 0.
5 of 5 steps (100%) done
Complete log: .snakemake/log/2024-02-20T143823.589116.snakemake.log
Observe the output
We can see that each input is now run with FastQC individually. You can see this when you look at the “Job stats” table (3 fastqc jobs), but also when you look at the fastqc command lines that were run (there is now 1 command per file). Note that Snakemake’s order of execution can be quite random for independent jobs.
Job stats:
job count
------- -------
all 1
fastqc 3
multiqc 1
total 5