Getting started with Snakemake

Exercise 1A - create your first snakefile

objective > setup > o1 > o2 > o3 > o4 > o5 > o6 > recap

Objective 2

Let’s scale up a little bit! Create a new snakefile named ex1_o2.smk which will run FastQC on two RNA-seq input files.
ex1_o2_workflow_1rule_2inputs
Where to start?
  • Input files: SRR3099585_chr18.fastq.gz and SRR3099586_chr18.fastq.gz in the ${PWD}/Data directory
  • Expected output: we now expect result files for both of these inputs, so don’t forget to add the respective *_fastqc.zip and *_fastqc.html files for the second input

Your Code for ex1_o2.smk should look like this:

rule fastqc:
  input: 
"Data/SRR3099585_chr18.fastq.gz",
"Data/SRR3099586_chr18.fastq.gz"
output: "FastQC/SRR3099585_chr18_fastqc.zip", "FastQC/SRR3099585_chr18_fastqc.html",
 "FastQC/SRR3099586_chr18_fastqc.zip",
"FastQC/SRR3099586_chr18_fastqc.html" shell: "fastqc --outdir FastQC/ {input}"
Test the script

Next, let’s check again if your pipeline still works:

				
					snakemake -s ex1_o2.smk --cores 1 -p
				
			

-s : short form of the --snakefile option

-p: prints the commands

You should see something similar to the following output on your screen:

				
					Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job       count
------  -------
fastqc        1
total         1

Select jobs to execute...
Execute 1 jobs...

[Tue Feb 20 14:04:49 2024]
localrule fastqc:
    input: Data/SRR3099585_chr18.fastq.gz, Data/SRR3099586_chr18.fastq.gz
    output: FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.html, FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.html
    jobid: 0
    reason: Missing output files: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.html
    resources: tmpdir=/var/tmp/pbs.743371.pbsserver

fastqc --outdir FastQC/ Data/SRR3099585_chr18.fastq.gz Data/SRR3099586_chr18.fastq.gz
Started analysis of SRR3099585_chr18.fastq.gz
Approx 5% complete for SRR3099585_chr18.fastq.gz
Approx 10% complete for SRR3099585_chr18.fastq.gz
Approx 15% complete for SRR3099585_chr18.fastq.gz
Approx 20% complete for SRR3099585_chr18.fastq.gz
Approx 25% complete for SRR3099585_chr18.fastq.gz
Approx 30% complete for SRR3099585_chr18.fastq.gz
Approx 35% complete for SRR3099585_chr18.fastq.gz
Approx 40% complete for SRR3099585_chr18.fastq.gz
Approx 45% complete for SRR3099585_chr18.fastq.gz
Approx 50% complete for SRR3099585_chr18.fastq.gz
Approx 55% complete for SRR3099585_chr18.fastq.gz
Approx 60% complete for SRR3099585_chr18.fastq.gz
Approx 65% complete for SRR3099585_chr18.fastq.gz
Approx 70% complete for SRR3099585_chr18.fastq.gz
Approx 75% complete for SRR3099585_chr18.fastq.gz
Approx 80% complete for SRR3099585_chr18.fastq.gz
Approx 85% complete for SRR3099585_chr18.fastq.gz
Approx 90% complete for SRR3099585_chr18.fastq.gz
Approx 95% complete for SRR3099585_chr18.fastq.gz
Analysis complete for SRR3099585_chr18.fastq.gz
Started analysis of SRR3099586_chr18.fastq.gz
Approx 5% complete for SRR3099586_chr18.fastq.gz
Approx 10% complete for SRR3099586_chr18.fastq.gz
Approx 15% complete for SRR3099586_chr18.fastq.gz
Approx 20% complete for SRR3099586_chr18.fastq.gz
Approx 25% complete for SRR3099586_chr18.fastq.gz
Approx 30% complete for SRR3099586_chr18.fastq.gz
Approx 35% complete for SRR3099586_chr18.fastq.gz
Approx 40% complete for SRR3099586_chr18.fastq.gz
Approx 45% complete for SRR3099586_chr18.fastq.gz
Approx 50% complete for SRR3099586_chr18.fastq.gz
Approx 55% complete for SRR3099586_chr18.fastq.gz
Approx 60% complete for SRR3099586_chr18.fastq.gz
Approx 65% complete for SRR3099586_chr18.fastq.gz
Approx 70% complete for SRR3099586_chr18.fastq.gz
Approx 75% complete for SRR3099586_chr18.fastq.gz
Approx 80% complete for SRR3099586_chr18.fastq.gz
Approx 85% complete for SRR3099586_chr18.fastq.gz
Approx 90% complete for SRR3099586_chr18.fastq.gz
Approx 95% complete for SRR3099586_chr18.fastq.gz
Analysis complete for SRR3099586_chr18.fastq.gz
[Tue Feb 20 14:05:02 2024]
Finished job 0.
1 of 1 steps (100%) done
Complete log: .snakemake/log/2024-02-20T140449.328558.snakemake.log

				
			
Observe the output

As you can see in the highlighted line above, Snakemake detects that FastQC wasn’t yet run on your second input (“Missing output files: FastQC/SRR3099586_ch18_fastqc.html, FastQC/SRR3099586_ch18_fastqc.zip“, line 20) and then re-executes the fastqc command on this input.

Have a look at your output folder, you should now have 4 files in there:

				
					john.doe@node06:/data/work/I2BC/john.doe/snakemake_tutorial$ ls FastQC
SRR3099585_chr18_fastqc.html  SRR3099585_chr18_fastqc.zip  SRR3099586_chr18_fastqc.html  SRR3099586_chr18_fastqc.zip
				
			
Scroll to Top