Getting started with Snakemake

Exercise 1C - optimising resource usage

objective > setup > o1 > o2 > o3 > o4 > o5 > recap

Objective 5:

Create the ex1c_o5.smk Snakefile in which we will specify custom resources for each rule.
Where to start?

These resources can be added using the threads (number of processors) and resources (memory, walltime, etc.) directives, for example:

rule ruleName:
    input:
        inputFile.txt
    output:
        outputFile.txt
    envmodules:
        "nodes/mySoftware",
    threads: 1
    resources:
        mem="100Mb",
        time_min="00:05:00"
    shell:
        """
          mySoftware {input} > {output}
        """

Your code for ex1c_o5.smk should look like this:

SAMPLES, = glob_wildcards(config["dataDir"]+"/{sample}.fastq.gz")

rule all:
  input:
    expand("FastQC/{sample}_fastqc.html", sample=SAMPLES),
    "multiqc_report.html"

rule fastqc:
  input:
    config["dataDir"]+"/{sample}.fastq.gz"
  output:
    "FastQC/{sample}_fastqc.zip",
    "FastQC/{sample}_fastqc.html"
  log:
    "Logs/{sample}_fastqc.std",
    "Logs/{sample}_fastqc.err"  
envmodules: "fastqc/fastqc_v0.11.5"
threads: 1
resources:
mem="100Mb",
time_min="00:05:00"
shell: "fastqc --outdir FastQC {input} 1>{log[0]} 2>{log[1]}" rule multiqc: input: expand("FastQC/{sample}_fastqc.zip", sample = SAMPLES) output: "multiqc_report.html", directory("multiqc_data") log: std="Logs/multiqc.std", err="Logs/multiqc.err"
envmodules: "nodes/multiqc-1.9"
threads: 1
resources:
mem="1Gb",
time_min="00:10:00"
shell: "multiqc {input} 1>{log.std} 2>{log.err}"
Run you Snakefile
Let’s try running your Snakefile again:
				
					snakemake -s ex1c_o5.smk --configfile ex1.yml -R fastqc --profile pbs
				
			
Your output should look like this:
				
					Using profile pbs for setting default command line arguments.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided remote nodes: 6
Job stats:
job        count
-------  -------
all            1
fastqc         6
multiqc        1
total          8

Select jobs to execute...
Execute 6 jobs...

[Wed Feb 21 23:41:09 2024]
rule fastqc:
    input: Data/SRR3099586_chr18.fastq.gz
    output: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.html
    log: Logs/SRR3099586_chr18_fastqc.std, Logs/SRR3099586_chr18_fastqc.err
    jobid: 1
    reason: Forced execution
    wildcards: sample=SRR3099586_chr18
    resources: mem_mb=100, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, threads=1, mem=100Mb, time_min=00:05:00

fastqc --outdir FastQC Data/SRR3099586_chr18.fastq.gz 1>Logs/SRR3099586_chr18_fastqc.std 2>Logs/SRR3099586_chr18_fastqc.err
Submitted job 1 with external jobid '748821.pbsserver'.

[Wed Feb 21 23:41:09 2024]
rule fastqc:
    input: Data/SRR3105699_chr18.fastq.gz
    output: FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR3105699_chr18_fastqc.html
    log: Logs/SRR3105699_chr18_fastqc.std, Logs/SRR3105699_chr18_fastqc.err
    jobid: 6
    reason: Forced execution
    wildcards: sample=SRR3105699_chr18
    resources: mem_mb=100, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, threads=1, mem=100Mb, time_min=00:05:00

fastqc --outdir FastQC Data/SRR3105699_chr18.fastq.gz 1>Logs/SRR3105699_chr18_fastqc.std 2>Logs/SRR3105699_chr18_fastqc.err
Submitted job 6 with external jobid '748822.pbsserver'.

[Wed Feb 21 23:41:09 2024]
rule fastqc:
    input: Data/SRR3105698_chr18.fastq.gz
    output: FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.html
    log: Logs/SRR3105698_chr18_fastqc.std, Logs/SRR3105698_chr18_fastqc.err
    jobid: 2
    reason: Forced execution
    wildcards: sample=SRR3105698_chr18
    resources: mem_mb=100, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, threads=1, mem=100Mb, time_min=00:05:00

fastqc --outdir FastQC Data/SRR3105698_chr18.fastq.gz 1>Logs/SRR3105698_chr18_fastqc.std 2>Logs/SRR3105698_chr18_fastqc.err
Submitted job 2 with external jobid '748823.pbsserver'.

[Wed Feb 21 23:41:09 2024]
rule fastqc:
    input: Data/SRR3105697_chr18.fastq.gz
    output: FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.html
    log: Logs/SRR3105697_chr18_fastqc.std, Logs/SRR3105697_chr18_fastqc.err
    jobid: 5
    reason: Forced execution
    wildcards: sample=SRR3105697_chr18
    resources: mem_mb=100, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, threads=1, mem=100Mb, time_min=00:05:00

fastqc --outdir FastQC Data/SRR3105697_chr18.fastq.gz 1>Logs/SRR3105697_chr18_fastqc.std 2>Logs/SRR3105697_chr18_fastqc.err
Submitted job 5 with external jobid '748824.pbsserver'.

[Wed Feb 21 23:41:09 2024]
rule fastqc:
    input: Data/SRR3099587_chr18.fastq.gz
    output: FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.html
    log: Logs/SRR3099587_chr18_fastqc.std, Logs/SRR3099587_chr18_fastqc.err
    jobid: 3
    reason: Forced execution
    wildcards: sample=SRR3099587_chr18
    resources: mem_mb=100, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, threads=1, mem=100Mb, time_min=00:05:00

fastqc --outdir FastQC Data/SRR3099587_chr18.fastq.gz 1>Logs/SRR3099587_chr18_fastqc.std 2>Logs/SRR3099587_chr18_fastqc.err
Submitted job 3 with external jobid '748825.pbsserver'.

[Wed Feb 21 23:41:09 2024]
rule fastqc:
    input: Data/SRR3099585_chr18.fastq.gz
    output: FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.html
    log: Logs/SRR3099585_chr18_fastqc.std, Logs/SRR3099585_chr18_fastqc.err
    jobid: 4
    reason: Forced execution
    wildcards: sample=SRR3099585_chr18
    resources: mem_mb=100, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, threads=1, mem=100Mb, time_min=00:05:00

fastqc --outdir FastQC Data/SRR3099585_chr18.fastq.gz 1>Logs/SRR3099585_chr18_fastqc.std 2>Logs/SRR3099585_chr18_fastqc.err
Submitted job 4 with external jobid '748826.pbsserver'.
[Wed Feb 21 23:41:38 2024]
Finished job 1.
1 of 8 steps (12%) done
[Wed Feb 21 23:41:38 2024]
Finished job 6.
2 of 8 steps (25%) done
[Wed Feb 21 23:41:38 2024]
Finished job 2.
3 of 8 steps (38%) done
[Wed Feb 21 23:41:38 2024]
Finished job 5.
4 of 8 steps (50%) done
[Wed Feb 21 23:41:38 2024]
Finished job 3.
5 of 8 steps (62%) done
[Wed Feb 21 23:41:39 2024]
Finished job 4.
6 of 8 steps (75%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Feb 21 23:41:39 2024]
rule multiqc:
    input: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105
699_chr18_fastqc.zip
    output: multiqc_report.html, multiqc_data
    log: Logs/multiqc.std, Logs/multiqc.err
    jobid: 7
    reason: Input files updated by another job: FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR309
9585_chr18_fastqc.zip, FastQC/SRR3105699_chr18_fastqc.zip
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, threads=1, mem=1Gb, time_min=00:10:00

multiqc FastQC/SRR3099586_chr18_fastqc.zip FastQC/SRR3105698_chr18_fastqc.zip FastQC/SRR3099587_chr18_fastqc.zip FastQC/SRR3099585_chr18_fastqc.zip FastQC/SRR3105697_chr18_fastqc.zip FastQC/SRR3105699_chr1
8_fastqc.zip 1>Logs/multiqc.std 2>Logs/multiqc.err
Submitted job 7 with external jobid '748827.pbsserver'.
[Wed Feb 21 23:41:49 2024]
Finished job 7.
7 of 8 steps (88%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Feb 21 23:41:49 2024]
localrule all:
    input: FastQC/SRR3099586_chr18_fastqc.html, FastQC/SRR3105698_chr18_fastqc.html, FastQC/SRR3099587_chr18_fastqc.html, FastQC/SRR3099585_chr18_fastqc.html, FastQC/SRR3105697_chr18_fastqc.html, FastQC/SR
R3105699_chr18_fastqc.html, multiqc_report.html
    jobid: 0
    reason: Input files updated by another job: FastQC/SRR3099585_chr18_fastqc.html, FastQC/SRR3105699_chr18_fastqc.html, FastQC/SRR3105697_chr18_fastqc.html, FastQC/SRR3099587_chr18_fastqc.html, FastQC/SR
R3105698_chr18_fastqc.html, FastQC/SRR3099586_chr18_fastqc.html, multiqc_report.html
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=/var/tmp/pbs.748722.pbsserver, threads=1, mem=1Gb

[Wed Feb 21 23:41:49 2024]
Finished job 0.
8 of 8 steps (100%) done
Complete log: .snakemake/log/2024-02-21T234108.721811.snakemake.log

				
			
Observe the output
As you can see in the log output, fastqc and multiqc jobs weren’t run with the same resources as you can see in the log (cf. highlighted lines above, e.g.: resources: mem_mb=100, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, threads=1, mem=100Mb, time_min=00:05:00).
In order to know how much resources your jobs actually used, you can use the cluster’s qstat command: qstat -fxw -G <jobID>. The job id given by the cluster are also listed in the log that’s printed on your screen (cf. highlighted lines above, e.g.: Submitted job 1 with external jobid '748821.pbsserver').
If we compare the resources used in ex1c_o3.smk (default resources) and ex1c_o5.smk (customised resources):
				
					# ex1c_o3.smk
  Job Id: 1533104.pbsserver
    session_id = 51161
    Resource_List.mem = 1gb
    Resource_List.ncpus = 1
    Resource_List.nodect = 1
    Resource_List.place = pack
    Resource_List.preempt_targets = QUEUE=lowprio
    Resource_List.select = 1:mem=1gb:ncpus=1
    Resource_List.walltime = 02:00:00
    Job_Name = snakejob.fastqc.2.sh
    Job_Owner = c.toffano-nioche@node06.example.org
    resources_used.cpupercent = 0
    resources_used.cput = 00:00:10
    resources_used.mem = 102400kb
    resources_used.ncpus = 1
    resources_used.vmem = 2823656kb
    resources_used.walltime = 00:00:17
    job_state = F
    queue = common
				
			
				
					# ex1c_o5.smk
   Job Id: 1533800.pbsserver
    session_id = 51712
    Resource_List.mem = 100mb
    Resource_List.ncpus = 1
    Resource_List.nodect = 1
    Resource_List.place = pack
    Resource_List.preempt_targets = QUEUE=lowprio
    Resource_List.select = 1:mem=100mb:ncpus=1
    Resource_List.walltime = 00:05:00
    Job_Name = snakejob.fastqc.2.sh
    Job_Owner = c.toffano-nioche@node06.example.org
    resources_used.cpupercent = 0
    resources_used.cput = 00:00:10
    resources_used.mem = 102400kb
    resources_used.ncpus = 1
    resources_used.vmem = 2823656kb
    resources_used.walltime = 00:00:17
    job_state = F
    queue = common
				
			
Lines starting with Resource_List summarise the resources reserved for your job. In particular, lines 4, 5 and 10 show the (RAM) memory, the number of processors and the walltime that were reserved.
Lines starting with resources_used summarise the resources that were actually used by your job. In particular, lines 13, 15 and 18 show the percentage of processors used (between 0 and 100% x ncpus), the (RAM) memory and the actual time that was used.

For example, we can see that 100mb is much more adapted than 1gb, considering that the actual memory used is about 100mb.

Scroll to Top