Getting started with Snakemake

Exercise 1A - create your first snakefile

objective > setup > o1 > o2 > o3 > o4 > o5 > o6 > recap

Objective 6

Create a new snakefile named ex1_o6.smk in which fastqc runs on each input individually.
ex1_o6_workflow_2rules_3inputs
Where to start?

Our culprit: the expand() function in the fastqc rule provides a list of files which are then interpreted as a combined input rather than individual files.

  • change the fastqc rule: remove the expand() in the input and output of the fastqc rule but keep the wildcard-containing strings

Your code for ex1_o6.smk should look like this:

SAMPLES=["SRR3099585_chr18","SRR3099586_chr18","SRR3099587_chr18"]

rule all:
  input:
    expand("FastQC/{sample}_fastqc.html", sample=SAMPLES),
expand("FastQC/{sample}_fastqc.zip", sample=SAMPLES), "multiqc_report.html",
"multiqc_data",
rule fastqc:
input:
"Data/{sample}.fastq.gz"
output:
"FastQC/{sample}_fastqc.zip",
"FastQC/{sample}_fastqc.html"

shell: "fastqc --outdir FastQC {input}"
rule multiqc: input: expand("FastQC/{sample}_fastqc.zip", sample = SAMPLES) output: "multiqc_report.html", directory("multiqc_data") shell: "multiqc {input}"

Explanation: we used wildcards to “generalise” the input and output of the fastqc rule. You can see Data/{sample}.fastq.gz, FastQC/{sample}_fastqc.zip and FastQC/{sample}_fastqc.html as “templates” (with “{sample}” being the only variable part) for input and output file names for the fastqc rule.

 

Of note, {sample} doesn’t have to match the wildcard name given in the previous expand functions. In theory, we could have used any other wildcard name as long as input and output directives of a same rule match (e.g. {mysample} instead of {sample}).

Test the script

Next, let’s check again if your pipeline works:

You should see something similar to the following output on your screen.

Observe the output

We can see that each input is now run with FastQC individually. You can see this when you look at the “Job stats” table (3 fastqc jobs), but also when you look at the fastqc command lines that were run (there is now 1 command per file). Note that Snakemake’s order of execution can be quite random for independent jobs.

Scroll to Top