Getting started with Snakemake

Exercise 1C - optimising resource usage

objective > setup > o1 > o2 > o3 > o4 > o5 > recap

Objective 3:

Controlling the software environment in Snakemake. Create a snakefile called ex1c_o3.smk in which we will use the envmodules directive to load multiqc and fastqc before running each of these rules.
Where to start?

So that Snakemake automatically loads the correct software with module load before running the actual command line for a given rule, we can add the envmodules directive to the rules. For example, instead of using module load nodes/mySoftware, we could integrate it in the rule like this:

rule ruleName:
    input:
        inputFile.txt
    output:
        outputFile.txt
    envmodules:
        "nodes/mySoftware",
    shell:
        """
          mySoftware {input} > {output}
        """

Your code for ex1c_o3.smk should look like this:

SAMPLES, = glob_wildcards(config["dataDir"]+"/{sample}.fastq.gz")

rule all:
  input:
    expand("FastQC/{sample}_fastqc.html", sample=SAMPLES),
    "multiqc_report.html"

rule fastqc:
  input:
    config["dataDir"]+"/{sample}.fastq.gz"
  output:
    "FastQC/{sample}_fastqc.zip",
    "FastQC/{sample}_fastqc.html"
  log:
    "Logs/{sample}_fastqc.std",
    "Logs/{sample}_fastqc.err"  
envmodules: "fastqc/fastqc_v0.11.5" shell: "fastqc --outdir FastQC {input} 1>{log[0]} 2>{log[1]}" rule multiqc: input: expand("FastQC/{sample}_fastqc.zip", sample = SAMPLES) output: "multiqc_report.html", directory("multiqc_data") log: std="Logs/multiqc.std", err="Logs/multiqc.err"
envmodules: "nodes/multiqc-1.9"
shell: "multiqc {input} 1>{log.std} 2>{log.err}"
Run your Snakefile
Now let’s run Snakemake again with the -R fastqc option. Don’t forget to also add --software-deployment-method env-module:
				
					snakemake -s ex1c_o3.smk --software-deployment-method env-modules --executor "cluster-generic" --cluster-generic-submit-cmd "qsub -V -l ncpus=1 -l mem=100Mb" --jobs 6 --configfile ex1.yml -p -R fastqc 
				
			
Your output should look like this:
				
					Building DAG of jobs...
Using shell: /usr/bin/bash
Provided remote nodes: 6
Job stats:
job        count
-------  -------
all            1
fastqc         6
multiqc        1
total          8

Select jobs to execute...
Execute 6 jobs...

[Wed Feb 21 22:36:11 2024]
rule fastqc:
    input: Data/SRR3105699_chr18.fastq.gz
    output: FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR3105699_chr18_fastqc.html
    log: Logs/SRR3105699_chr18_fastqc.std, Logs/SRR3105699_chr18_fastqc.err
    jobid: 6
    reason: Forced execution
    wildcards: sample=SRR3105699_chr18
    resources: tmpdir=<TBD>

fastqc --outdir FastQC Data/SRR3105699_chr18.fastq.gz 1>Logs/SRR3105699_chr18_fastqc.std 2>Logs/SRR3105699_chr18_fastqc.err
Submitted job 6 with external jobid '748703.pbsserver'.

[Wed Feb 21 22:36:11 2024]
rule fastqc:
    input: Data/SRR3099586_chr18.fastq.gz
    output: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.html
    log: Logs/SRR3099586_chr18_fastqc.std, Logs/SRR3099586_chr18_fastqc.err
    jobid: 1
    reason: Forced execution
    wildcards: sample=SRR3099586_chr18
    resources: tmpdir=<TBD>

fastqc --outdir FastQC Data/SRR3099586_chr18.fastq.gz 1>Logs/SRR3099586_chr18_fastqc.std 2>Logs/SRR3099586_chr18_fastqc.err
Submitted job 1 with external jobid '748704.pbsserver'.

[Wed Feb 21 22:36:11 2024]
rule fastqc:
    input: Data/SRR3099585_chr18.fastq.gz
    output: FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.html
    log: Logs/SRR3099585_chr18_fastqc.std, Logs/SRR3099585_chr18_fastqc.err
    jobid: 4
    reason: Forced execution
    wildcards: sample=SRR3099585_chr18
    resources: tmpdir=<TBD>

fastqc --outdir FastQC Data/SRR3099585_chr18.fastq.gz 1>Logs/SRR3099585_chr18_fastqc.std 2>Logs/SRR3099585_chr18_fastqc.err
Submitted job 4 with external jobid '748705.pbsserver'.

[Wed Feb 21 22:36:11 2024]
rule fastqc:
    input: Data/SRR3099587_chr18.fastq.gz
    output: FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.html
    log: Logs/SRR3099587_chr18_fastqc.std, Logs/SRR3099587_chr18_fastqc.err
    jobid: 3
    reason: Forced execution
    wildcards: sample=SRR3099587_chr18
    resources: tmpdir=<TBD>

fastqc --outdir FastQC Data/SRR3099587_chr18.fastq.gz 1>Logs/SRR3099587_chr18_fastqc.std 2>Logs/SRR3099587_chr18_fastqc.err
Submitted job 3 with external jobid '748706.pbsserver'.

[Wed Feb 21 22:36:11 2024]
rule fastqc:
    input: Data/SRR3105698_chr18.fastq.gz
    output: FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.html
    log: Logs/SRR3105698_chr18_fastqc.std, Logs/SRR3105698_chr18_fastqc.err
    jobid: 2
    reason: Forced execution
    wildcards: sample=SRR3105698_chr18
    resources: tmpdir=<TBD>

fastqc --outdir FastQC Data/SRR3105698_chr18.fastq.gz 1>Logs/SRR3105698_chr18_fastqc.std 2>Logs/SRR3105698_chr18_fastqc.err
Submitted job 2 with external jobid '748707.pbsserver'.

[Wed Feb 21 22:36:11 2024]
rule fastqc:
    input: Data/SRR3105697_chr18.fastq.gz
    output: FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.html
    log: Logs/SRR3105697_chr18_fastqc.std, Logs/SRR3105697_chr18_fastqc.err
    jobid: 5
    reason: Forced execution
    wildcards: sample=SRR3105697_chr18
    resources: tmpdir=<TBD>

fastqc --outdir FastQC Data/SRR3105697_chr18.fastq.gz 1>Logs/SRR3105697_chr18_fastqc.std 2>Logs/SRR3105697_chr18_fastqc.err
Submitted job 5 with external jobid '748708.pbsserver'.
[Wed Feb 21 22:36:40 2024]
Finished job 6.
1 of 8 steps (12%) done
[Wed Feb 21 22:36:40 2024]
Finished job 1.
2 of 8 steps (25%) done
[Wed Feb 21 22:36:40 2024]
Finished job 4.
3 of 8 steps (38%) done
[Wed Feb 21 22:36:40 2024]
Finished job 3.
4 of 8 steps (50%) done
[Wed Feb 21 22:36:40 2024]
Finished job 2.
5 of 8 steps (62%) done
[Wed Feb 21 22:36:41 2024]
Finished job 5.
6 of 8 steps (75%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Feb 21 22:36:41 2024]
rule multiqc:
    input: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105
699_chr18_fastqc.zip
    output: multiqc_report.html, multiqc_data
    log: Logs/multiqc.std, Logs/multiqc.err
    jobid: 7
    reason: Input files updated by another job: FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR309
9587_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.zip
    resources: tmpdir=<TBD>

multiqc FastQC/SRR3099586_chr18_fastqc.zip FastQC/SRR3105698_chr18_fastqc.zip FastQC/SRR3099587_chr18_fastqc.zip FastQC/SRR3099585_chr18_fastqc.zip FastQC/SRR3105697_chr18_fastqc.zip FastQC/SRR3105699_chr1
8_fastqc.zip 1>Logs/multiqc.std 2>Logs/multiqc.err
Submitted job 7 with external jobid '748713.pbsserver'.
Will exit after finishing currently running jobs (scheduler).

[Wed Feb 21 22:37:11 2024]
Finished job 7.
7 of 8 steps (88%) done
Will exit after finishing currently running jobs (scheduler).
Shutting down, this might take some time.


				
			
Observe the output

Congratulations! You’ve run your first Snakefile through the OpenPBS scheduler!

Now let’s see how we can simplify the command line because it’s starting to get really long…

Scroll to Top