Getting started with Snakemake
Objective 3:
Controlling the software environment in Snakemake. Create a snakefile called
ex1c_o3.smk
in which we will use the envmodules
directive to load multiqc and fastqc before running each of these rules.Where to start?
So that Snakemake automatically loads the correct software with module load
before running the actual command line for a given rule, we can add the envmodules
directive to the rules. For example, instead of using module load nodes/mySoftware
, we could integrate it in the rule like this:
rule ruleName: input: inputFile.txt output: outputFile.txt envmodules: "nodes/mySoftware", shell: """ mySoftware {input} > {output} """
Your code for ex1c_o3.smk
should look like this:
SAMPLES, = glob_wildcards(config["dataDir"]+"/{sample}.fastq.gz") rule all: input: expand("FastQC/{sample}_fastqc.html", sample=SAMPLES), "multiqc_report.html" rule fastqc: input: config["dataDir"]+"/{sample}.fastq.gz" output: "FastQC/{sample}_fastqc.zip", "FastQC/{sample}_fastqc.html" log: "Logs/{sample}_fastqc.std", "Logs/{sample}_fastqc.err"
envmodules: "fastqc/fastqc_v0.11.5" shell: "fastqc --outdir FastQC {input} 1>{log[0]} 2>{log[1]}" rule multiqc: input: expand("FastQC/{sample}_fastqc.zip", sample = SAMPLES) output: "multiqc_report.html", directory("multiqc_data") log: std="Logs/multiqc.std", err="Logs/multiqc.err"
envmodules: "nodes/multiqc-1.9" shell: "multiqc {input} 1>{log.std} 2>{log.err}"
Run your Snakefile
Now let’s run Snakemake again with the
-R fastqc
option. Don’t forget to also add --software-deployment-method env-module
:
snakemake -s ex1c_o3.smk --software-deployment-method env-modules --executor "cluster-generic" --cluster-generic-submit-cmd "qsub -V -l ncpus=1 -l mem=100Mb" --jobs 6 --configfile ex1.yml -p -R fastqc
Your output should look like this:
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided remote nodes: 6
Job stats:
job count
------- -------
all 1
fastqc 6
multiqc 1
total 8
Select jobs to execute...
Execute 6 jobs...
[Wed Feb 21 22:36:11 2024]
rule fastqc:
input: Data/SRR3105699_chr18.fastq.gz
output: FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR3105699_chr18_fastqc.html
log: Logs/SRR3105699_chr18_fastqc.std, Logs/SRR3105699_chr18_fastqc.err
jobid: 6
reason: Forced execution
wildcards: sample=SRR3105699_chr18
resources: tmpdir=
fastqc --outdir FastQC Data/SRR3105699_chr18.fastq.gz 1>Logs/SRR3105699_chr18_fastqc.std 2>Logs/SRR3105699_chr18_fastqc.err
Submitted job 6 with external jobid '748703.pbsserver'.
[Wed Feb 21 22:36:11 2024]
rule fastqc:
input: Data/SRR3099586_chr18.fastq.gz
output: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.html
log: Logs/SRR3099586_chr18_fastqc.std, Logs/SRR3099586_chr18_fastqc.err
jobid: 1
reason: Forced execution
wildcards: sample=SRR3099586_chr18
resources: tmpdir=
fastqc --outdir FastQC Data/SRR3099586_chr18.fastq.gz 1>Logs/SRR3099586_chr18_fastqc.std 2>Logs/SRR3099586_chr18_fastqc.err
Submitted job 1 with external jobid '748704.pbsserver'.
[Wed Feb 21 22:36:11 2024]
rule fastqc:
input: Data/SRR3099585_chr18.fastq.gz
output: FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.html
log: Logs/SRR3099585_chr18_fastqc.std, Logs/SRR3099585_chr18_fastqc.err
jobid: 4
reason: Forced execution
wildcards: sample=SRR3099585_chr18
resources: tmpdir=
fastqc --outdir FastQC Data/SRR3099585_chr18.fastq.gz 1>Logs/SRR3099585_chr18_fastqc.std 2>Logs/SRR3099585_chr18_fastqc.err
Submitted job 4 with external jobid '748705.pbsserver'.
[Wed Feb 21 22:36:11 2024]
rule fastqc:
input: Data/SRR3099587_chr18.fastq.gz
output: FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.html
log: Logs/SRR3099587_chr18_fastqc.std, Logs/SRR3099587_chr18_fastqc.err
jobid: 3
reason: Forced execution
wildcards: sample=SRR3099587_chr18
resources: tmpdir=
fastqc --outdir FastQC Data/SRR3099587_chr18.fastq.gz 1>Logs/SRR3099587_chr18_fastqc.std 2>Logs/SRR3099587_chr18_fastqc.err
Submitted job 3 with external jobid '748706.pbsserver'.
[Wed Feb 21 22:36:11 2024]
rule fastqc:
input: Data/SRR3105698_chr18.fastq.gz
output: FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.html
log: Logs/SRR3105698_chr18_fastqc.std, Logs/SRR3105698_chr18_fastqc.err
jobid: 2
reason: Forced execution
wildcards: sample=SRR3105698_chr18
resources: tmpdir=
fastqc --outdir FastQC Data/SRR3105698_chr18.fastq.gz 1>Logs/SRR3105698_chr18_fastqc.std 2>Logs/SRR3105698_chr18_fastqc.err
Submitted job 2 with external jobid '748707.pbsserver'.
[Wed Feb 21 22:36:11 2024]
rule fastqc:
input: Data/SRR3105697_chr18.fastq.gz
output: FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.html
log: Logs/SRR3105697_chr18_fastqc.std, Logs/SRR3105697_chr18_fastqc.err
jobid: 5
reason: Forced execution
wildcards: sample=SRR3105697_chr18
resources: tmpdir=
fastqc --outdir FastQC Data/SRR3105697_chr18.fastq.gz 1>Logs/SRR3105697_chr18_fastqc.std 2>Logs/SRR3105697_chr18_fastqc.err
Submitted job 5 with external jobid '748708.pbsserver'.
[Wed Feb 21 22:36:40 2024]
Finished job 6.
1 of 8 steps (12%) done
[Wed Feb 21 22:36:40 2024]
Finished job 1.
2 of 8 steps (25%) done
[Wed Feb 21 22:36:40 2024]
Finished job 4.
3 of 8 steps (38%) done
[Wed Feb 21 22:36:40 2024]
Finished job 3.
4 of 8 steps (50%) done
[Wed Feb 21 22:36:40 2024]
Finished job 2.
5 of 8 steps (62%) done
[Wed Feb 21 22:36:41 2024]
Finished job 5.
6 of 8 steps (75%) done
Select jobs to execute...
Execute 1 jobs...
[Wed Feb 21 22:36:41 2024]
rule multiqc:
input: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105
699_chr18_fastqc.zip
output: multiqc_report.html, multiqc_data
log: Logs/multiqc.std, Logs/multiqc.err
jobid: 7
reason: Input files updated by another job: FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR309
9587_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.zip
resources: tmpdir=
multiqc FastQC/SRR3099586_chr18_fastqc.zip FastQC/SRR3105698_chr18_fastqc.zip FastQC/SRR3099587_chr18_fastqc.zip FastQC/SRR3099585_chr18_fastqc.zip FastQC/SRR3105697_chr18_fastqc.zip FastQC/SRR3105699_chr1
8_fastqc.zip 1>Logs/multiqc.std 2>Logs/multiqc.err
Submitted job 7 with external jobid '748713.pbsserver'.
Will exit after finishing currently running jobs (scheduler).
[Wed Feb 21 22:37:11 2024]
Finished job 7.
7 of 8 steps (88%) done
Will exit after finishing currently running jobs (scheduler).
Shutting down, this might take some time.
Observe the output
Congratulations! You’ve run your first Snakefile through the OpenPBS scheduler!
Now let’s see how we can simplify the command line because it’s starting to get really long…