Getting started with Snakemake
Objective 3
ex1_o3.smk
in which we add a new rule which will run MultiQC on a list of all output files of FastQC.Where to start?
MultiQC is a tool used to aggregate the results of multiple other tools into a single html file.
- Input files: FastQC’s zip file outputs.
- MultiQC command:
multiqc *fastqc.zip
- Expected output: 2 files:
multiqc_report.html
and amultiqc_data
repository
NB: when the output is a directory, you have to specify this using thedirectory()
function. In this case, you would have to put:directory("multiqc_data")
Your code for ex1_o3.smk
should look like this:
rule fastqc:
input:
"Data/SRR3099585_chr18.fastq.gz",
"Data/SRR3099586_chr18.fastq.gz",
output:
"FastQC/SRR3099585_chr18_fastqc.zip",
"FastQC/SRR3099585_chr18_fastqc.html",
"FastQC/SRR3099586_chr18_fastqc.zip",
"FastQC/SRR3099586_chr18_fastqc.html",
shell: "fastqc --outdir FastQC {input}"
rule multiqc:
input:
"FastQC/SRR3099585_chr18_fastqc.zip",
"FastQC/SRR3099586_chr18_fastqc.zip",
output:
"multiqc_report.html",
directory("multiqc_data")
shell:
"multiqc {input}"
Test the script
Next, let’s check again if your pipeline works:
You should see something similar to the following output on your screen:
Observe the output
Wait, what??! What about my new multiqc rule?
Expected behaviour:
Current behaviour:
By default, Snakemake will only execute the first rule it encounters in your Snakefile, it’s called the target rule. If (and only if) the necessary input files to execute this rule are missing, will it scan the other rules in your Snakefile to generate them. In our case, the fastqc rule is the target rule as it’s written first. Since all the necessary input files are already available for the fastqc rule, Snakemake doesn’t execute any of the other rules in the file. Let’s see in the next objective how to fix this.