Getting started with Snakemake
Objective 6
ex1_o6.smk
in which fastqc runs on each input individually.Where to start?
Our culprit: the expand()
function in the fastqc rule provides a list of files which are then interpreted as a combined input rather than individual files.
- change the fastqc rule: remove the
expand()
in the input and output of the fastqc rule but keep the wildcard-containing strings
Your code for ex1_o6.smk
should look like this:
SAMPLES=["SRR3099585_chr18","SRR3099586_chr18","SRR3099587_chr18"] rule all: input: expand("FastQC/{sample}_fastqc.html", sample=SAMPLES),
expand("FastQC/{sample}_fastqc.zip", sample=SAMPLES), "multiqc_report.html",
"multiqc_data",
rule fastqc:
input:
"Data/{sample}.fastq.gz"
output:
"FastQC/{sample}_fastqc.zip",
"FastQC/{sample}_fastqc.html"
shell: "fastqc --outdir FastQC {input}"
rule multiqc: input: expand("FastQC/{sample}_fastqc.zip", sample = SAMPLES) output: "multiqc_report.html", directory("multiqc_data") shell: "multiqc {input}"
Explanation: we used wildcards to “generalise” the input and output of the fastqc rule. You can see Data/{sample}.fastq.gz
, FastQC/{sample}_fastqc.zip
and FastQC/{sample}_fastqc.html
as “templates” (with “{sample}
” being the only variable part) for input and output file names for the fastqc rule.
Of note, {sample}
doesn’t have to match the wildcard name given in the previous expand functions. In theory, we could have used any other wildcard name as long as input and output directives of a same rule match (e.g. {mysample}
instead of {sample}
).
Test the script
Next, let’s check again if your pipeline works:
You should see something similar to the following output on your screen.
Observe the output
We can see that each input is now run with FastQC individually. You can see this when you look at the “Job stats” table (3 fastqc jobs), but also when you look at the fastqc command lines that were run (there is now 1 command per file). Note that Snakemake’s order of execution can be quite random for independent jobs.