Getting started with Snakemake

Exercise 1A - create your first snakefile

objective > setup > o1 > o2 > o3 > o4 > o5 > o6 > recap

Objective 6

Create a new snakefile named ex1_o6.smk in which fastqc runs on each input individually.
ex1_o6_workflow_2rules_3inputs
Where to start?

Our culprit: the expand() function in the fastqc rule provides a list of files which are then interpreted as a combined input rather than individual files.

  • change the fastqc rule: remove the expand() in the input and output of the fastqc rule but keep the wildcard-containing strings

Your code for ex1_o6.smk should look like this:

SAMPLES=["SRR3099585_chr18","SRR3099586_chr18","SRR3099587_chr18"]

rule all:
  input:
    expand("FastQC/{sample}_fastqc.html", sample=SAMPLES),
expand("FastQC/{sample}_fastqc.zip", sample=SAMPLES), "multiqc_report.html",
"multiqc_data",
rule fastqc:
input:
"Data/{sample}.fastq.gz"
output:
"FastQC/{sample}_fastqc.zip",
"FastQC/{sample}_fastqc.html"

shell: "fastqc --outdir FastQC {input}"
rule multiqc: input: expand("FastQC/{sample}_fastqc.zip", sample = SAMPLES) output: "multiqc_report.html", directory("multiqc_data") shell: "multiqc {input}"

Explanation: we used wildcards to “generalise” the input and output of the fastqc rule. You can see Data/{sample}.fastq.gz, FastQC/{sample}_fastqc.zip and FastQC/{sample}_fastqc.html as “templates” (with “{sample}” being the only variable part) for input and output file names for the fastqc rule.

 

Of note, {sample} doesn’t have to match the wildcard name given in the previous expand functions. In theory, we could have used any other wildcard name as long as input and output directives of a same rule match (e.g. {mysample} instead of {sample}).

Test the script

Next, let’s check again if your pipeline works:

				
					snakemake -s ex1_o6.smk --cores 1 -p
				
			

You should see something similar to the following output on your screen.

				
					Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job        count
-------  -------
all            1
fastqc         3
multiqc        1
total          5

Select jobs to execute...
Execute 1 jobs...

[Tue Feb 20 14:38:24 2024]
localrule fastqc:
    input: Data/SRR3099585_chr18.fastq.gz
    output: FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.html
    jobid: 1
    reason: Set of input files has changed since last execution
    wildcards: sample=SRR3099585_chr18
    resources: tmpdir=/var/tmp/pbs.743371.pbsserver

fastqc --outdir FastQC Data/SRR3099585_chr18.fastq.gz
Started analysis of SRR3099585_chr18.fastq.gz
Approx 5% complete for SRR3099585_chr18.fastq.gz
Approx 10% complete for SRR3099585_chr18.fastq.gz
Approx 15% complete for SRR3099585_chr18.fastq.gz
Approx 20% complete for SRR3099585_chr18.fastq.gz
Approx 25% complete for SRR3099585_chr18.fastq.gz
Approx 30% complete for SRR3099585_chr18.fastq.gz
Approx 35% complete for SRR3099585_chr18.fastq.gz
Approx 40% complete for SRR3099585_chr18.fastq.gz
Approx 45% complete for SRR3099585_chr18.fastq.gz
Approx 50% complete for SRR3099585_chr18.fastq.gz
Approx 55% complete for SRR3099585_chr18.fastq.gz
Approx 60% complete for SRR3099585_chr18.fastq.gz
Approx 65% complete for SRR3099585_chr18.fastq.gz
Approx 70% complete for SRR3099585_chr18.fastq.gz
Approx 75% complete for SRR3099585_chr18.fastq.gz
Approx 80% complete for SRR3099585_chr18.fastq.gz
Approx 85% complete for SRR3099585_chr18.fastq.gz
Approx 90% complete for SRR3099585_chr18.fastq.gz
Approx 95% complete for SRR3099585_chr18.fastq.gz
Analysis complete for SRR3099585_chr18.fastq.gz
[Tue Feb 20 14:38:31 2024]
Finished job 1.
1 of 5 steps (20%) done
Select jobs to execute...
Execute 1 jobs...

[Tue Feb 20 14:38:31 2024]
localrule fastqc:
    input: Data/SRR3099587_chr18.fastq.gz
    output: FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.html
    jobid: 3
    reason: Set of input files has changed since last execution
    wildcards: sample=SRR3099587_chr18
    resources: tmpdir=/var/tmp/pbs.743371.pbsserver

fastqc --outdir FastQC Data/SRR3099587_chr18.fastq.gz
Started analysis of SRR3099587_chr18.fastq.gz
Approx 5% complete for SRR3099587_chr18.fastq.gz
Approx 10% complete for SRR3099587_chr18.fastq.gz
Approx 15% complete for SRR3099587_chr18.fastq.gz
Approx 20% complete for SRR3099587_chr18.fastq.gz
Approx 25% complete for SRR3099587_chr18.fastq.gz
Approx 30% complete for SRR3099587_chr18.fastq.gz
Approx 35% complete for SRR3099587_chr18.fastq.gz
Approx 40% complete for SRR3099587_chr18.fastq.gz
Approx 45% complete for SRR3099587_chr18.fastq.gz
Approx 50% complete for SRR3099587_chr18.fastq.gz
Approx 55% complete for SRR3099587_chr18.fastq.gz
Approx 60% complete for SRR3099587_chr18.fastq.gz
Approx 65% complete for SRR3099587_chr18.fastq.gz
Approx 70% complete for SRR3099587_chr18.fastq.gz
Approx 75% complete for SRR3099587_chr18.fastq.gz
Approx 80% complete for SRR3099587_chr18.fastq.gz
Approx 85% complete for SRR3099587_chr18.fastq.gz
Approx 90% complete for SRR3099587_chr18.fastq.gz
Approx 95% complete for SRR3099587_chr18.fastq.gz
Analysis complete for SRR3099587_chr18.fastq.gz
[Tue Feb 20 14:38:39 2024]
Finished job 3.
2 of 5 steps (40%) done
Select jobs to execute...
Execute 1 jobs...

[Tue Feb 20 14:38:39 2024]
localrule fastqc:
    input: Data/SRR3099586_chr18.fastq.gz
    output: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.html
    jobid: 2
    reason: Set of input files has changed since last execution
    wildcards: sample=SRR3099586_chr18
    resources: tmpdir=/var/tmp/pbs.743371.pbsserver

fastqc --outdir FastQC Data/SRR3099586_chr18.fastq.gz
Started analysis of SRR3099586_chr18.fastq.gz
Approx 5% complete for SRR3099586_chr18.fastq.gz
Approx 10% complete for SRR3099586_chr18.fastq.gz
Approx 15% complete for SRR3099586_chr18.fastq.gz
Approx 20% complete for SRR3099586_chr18.fastq.gz
Approx 25% complete for SRR3099586_chr18.fastq.gz
Approx 30% complete for SRR3099586_chr18.fastq.gz
Approx 35% complete for SRR3099586_chr18.fastq.gz
Approx 40% complete for SRR3099586_chr18.fastq.gz
Approx 45% complete for SRR3099586_chr18.fastq.gz
Approx 50% complete for SRR3099586_chr18.fastq.gz
Approx 55% complete for SRR3099586_chr18.fastq.gz
Approx 60% complete for SRR3099586_chr18.fastq.gz
Approx 65% complete for SRR3099586_chr18.fastq.gz
Approx 70% complete for SRR3099586_chr18.fastq.gz
Approx 75% complete for SRR3099586_chr18.fastq.gz
Approx 80% complete for SRR3099586_chr18.fastq.gz
Approx 85% complete for SRR3099586_chr18.fastq.gz
Approx 90% complete for SRR3099586_chr18.fastq.gz
Approx 95% complete for SRR3099586_chr18.fastq.gz
Analysis complete for SRR3099586_chr18.fastq.gz
[Tue Feb 20 14:38:46 2024]
Finished job 2.
3 of 5 steps (60%) done
Select jobs to execute...
Execute 1 jobs...

[Tue Feb 20 14:38:46 2024]
localrule multiqc:
    input: FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.zip
    output: multiqc_report.html, multiqc_data
    jobid: 4
    reason: Input files updated by another job: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.zip
    resources: tmpdir=/var/tmp/pbs.743371.pbsserver

multiqc FastQC/SRR3099585_chr18_fastqc.zip FastQC/SRR3099586_chr18_fastqc.zip FastQC/SRR3099587_chr18_fastqc.zip
[WARNING]         multiqc : MultiQC Version v1.20 now available!
[INFO   ]         multiqc : This is MultiQC v1.9
[INFO   ]         multiqc : Template    : default
[INFO   ]         multiqc : Searching   : /data/work/I2BC/chloe.quignot/snakemake_tutorial/FastQC/SRR3099585_chr18_fastqc.zip
[INFO   ]         multiqc : Searching   : /data/work/I2BC/chloe.quignot/snakemake_tutorial/FastQC/SRR3099586_chr18_fastqc.zip
[INFO   ]         multiqc : Searching   : /data/work/I2BC/chloe.quignot/snakemake_tutorial/FastQC/SRR3099587_chr18_fastqc.zip
[INFO   ]          fastqc : Found 3 reports
[INFO   ]         multiqc : Compressing plot data
[INFO   ]         multiqc : Report      : multiqc_report.html
[INFO   ]         multiqc : Data        : multiqc_data
[INFO   ]         multiqc : MultiQC complete
[Tue Feb 20 14:38:51 2024]
Finished job 4.
4 of 5 steps (80%) done
Select jobs to execute...
Execute 1 jobs...

[Tue Feb 20 14:38:51 2024]
localrule all:
    input: FastQC/SRR3099585_chr18_fastqc.html, FastQC/SRR3099586_chr18_fastqc.html, FastQC/SRR3099587_chr18_fastqc.html, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3
099587_chr18_fastqc.zip, multiqc_report.html, multiqc_data
    jobid: 0
    reason: Input files updated by another job: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.html, multiqc_data, FastQC/SRR3099586_chr18_fastqc.html, FastQC/SRR3099587_chr18_fastqc.zi
p, FastQC/SRR3099585_chr18_fastqc.zip, multiqc_report.html, FastQC/SRR3099587_chr18_fastqc.html
    resources: tmpdir=/var/tmp/pbs.743371.pbsserver

[Tue Feb 20 14:38:51 2024]
Finished job 0.
5 of 5 steps (100%) done
Complete log: .snakemake/log/2024-02-20T143823.589116.snakemake.log

				
			
Observe the output

We can see that each input is now run with FastQC individually. You can see this when you look at the “Job stats” table (3 fastqc jobs), but also when you look at the fastqc command lines that were run (there is now 1 command per file). Note that Snakemake’s order of execution can be quite random for independent jobs.

				
					Job stats:
job        count
-------  -------
all            1
fastqc         3
multiqc        1
total          5
				
			
Scroll to Top