Getting started with Snakemake

Exercise 1C - optimising resource usage

objective > setup > o1 > o2 > o3 > o4 > o5 > recap

Objective 4:

Create a profile for Snakemake.
Where to start?

To avoid typing cluster-specific and basic options such as -p in the command line every time you run a Snakefile on the I2BC cluster, we can add them all to a profile file instead and then forget about them. We’ll call this file config.yaml and we’ll put it in your home directory in $HOME/.config/snakemake/pbs/. You might need to create the directory first:

				
					mkdir -p $HOME/.config/snakemake/pbs/
				
			

Inside this file ($HOME/.config/snakemake/pbs/config.yaml), we’ll put all the options we routinely use to run Snakemake as well as those we use specifically in conjunction with OpenPBS:

				
					# cluster-specific options:
jobs: 6
executor: cluster-generic
cluster-generic-submit-cmd: "qsub -l ncpus={threads} -l mem={resources.mem} -l walltime={resources.time_min}"
cluster-generic-cancel-cmd: "qdel"
# software option:
software-deployment-method: env-modules
# to avoid typing -p everytime:
printshellcmds: True
# set default resources for each job to 1 cpu and 1Gb if not specified otherwise:
default-resources: [threads=1, mem="1Gb", time_min="02:00:00"]
				
			

Since we’re creating a “general” profile, we have to be able to adjust the parameters given to the qsub command via the Snakefile. Thus, we can “generalise” the submission command with wildcards ({thread} or {resources.mem_mb}).

Run you Snakefile
Let’s try running your Snakefile again (don’t forget the --profile pbs):
				
					snakemake -s ex1c_o3.smk --configfile ex1.yml -R fastqc --profile pbs
				
			
Why --profile pbs? You can either specify a file path (directory in which the profile is) or just a profile name (the name is given by the parent directory in which the profile is saved). Snakemake will automatically look in the default places where the profile could be stored (including the directory in which we just placed ours).
Your output should look like this:
				
					Using profile pbs for setting default command line arguments.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided remote nodes: 6
Job stats:
job        count
-------  -------
all            1
fastqc         6
multiqc        1
total          8

Select jobs to execute...
Execute 6 jobs...

[Wed Feb 21 23:18:34 2024]
rule fastqc:
    input: Data/SRR3099585_chr18.fastq.gz
    output: FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.html
    log: Logs/SRR3099585_chr18_fastqc.std, Logs/SRR3099585_chr18_fastqc.err
    jobid: 4
    reason: Forced execution
    wildcards: sample=SRR3099585_chr18
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, threads=1, mem=1Gb

fastqc --outdir FastQC Data/SRR3099585_chr18.fastq.gz 1>Logs/SRR3099585_chr18_fastqc.std 2>Logs/SRR3099585_chr18_fastqc.err
Submitted job 4 with external jobid '748790.pbsserver'.

[Wed Feb 21 23:18:35 2024]
rule fastqc:
    input: Data/SRR3105697_chr18.fastq.gz
    output: FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.html
    log: Logs/SRR3105697_chr18_fastqc.std, Logs/SRR3105697_chr18_fastqc.err
    jobid: 5
    reason: Forced execution
    wildcards: sample=SRR3105697_chr18
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, threads=1, mem=1Gb

fastqc --outdir FastQC Data/SRR3105697_chr18.fastq.gz 1>Logs/SRR3105697_chr18_fastqc.std 2>Logs/SRR3105697_chr18_fastqc.err
Submitted job 5 with external jobid '748791.pbsserver'.

[Wed Feb 21 23:18:35 2024]
rule fastqc:
    input: Data/SRR3099586_chr18.fastq.gz
    output: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.html
    log: Logs/SRR3099586_chr18_fastqc.std, Logs/SRR3099586_chr18_fastqc.err
    jobid: 1
    reason: Forced execution
    wildcards: sample=SRR3099586_chr18
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, threads=1, mem=1Gb

fastqc --outdir FastQC Data/SRR3099586_chr18.fastq.gz 1>Logs/SRR3099586_chr18_fastqc.std 2>Logs/SRR3099586_chr18_fastqc.err
Submitted job 1 with external jobid '748792.pbsserver'.

[Wed Feb 21 23:18:35 2024]
rule fastqc:
    input: Data/SRR3099587_chr18.fastq.gz
    output: FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.html
    log: Logs/SRR3099587_chr18_fastqc.std, Logs/SRR3099587_chr18_fastqc.err
    jobid: 3
    reason: Forced execution
    wildcards: sample=SRR3099587_chr18
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, threads=1, mem=1Gb

fastqc --outdir FastQC Data/SRR3099587_chr18.fastq.gz 1>Logs/SRR3099587_chr18_fastqc.std 2>Logs/SRR3099587_chr18_fastqc.err
Submitted job 3 with external jobid '748793.pbsserver'.

[Wed Feb 21 23:18:35 2024]
rule fastqc:
    input: Data/SRR3105698_chr18.fastq.gz
    output: FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.html
    log: Logs/SRR3105698_chr18_fastqc.std, Logs/SRR3105698_chr18_fastqc.err
    jobid: 2
    reason: Forced execution
    wildcards: sample=SRR3105698_chr18
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, threads=1, mem=1Gb

fastqc --outdir FastQC Data/SRR3105698_chr18.fastq.gz 1>Logs/SRR3105698_chr18_fastqc.std 2>Logs/SRR3105698_chr18_fastqc.err
Submitted job 2 with external jobid '748794.pbsserver'.

[Wed Feb 21 23:18:35 2024]
rule fastqc:
    input: Data/SRR3105699_chr18.fastq.gz
    output: FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR3105699_chr18_fastqc.html
    log: Logs/SRR3105699_chr18_fastqc.std, Logs/SRR3105699_chr18_fastqc.err
    jobid: 6
    reason: Forced execution
    wildcards: sample=SRR3105699_chr18
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, threads=1, mem=1Gb

fastqc --outdir FastQC Data/SRR3105699_chr18.fastq.gz 1>Logs/SRR3105699_chr18_fastqc.std 2>Logs/SRR3105699_chr18_fastqc.err
Submitted job 6 with external jobid '748795.pbsserver'.
[Wed Feb 21 23:18:54 2024]
Finished job 4.
1 of 8 steps (12%) done
[Wed Feb 21 23:18:54 2024]
Finished job 5.
2 of 8 steps (25%) done
[Wed Feb 21 23:18:54 2024]
Finished job 1.
3 of 8 steps (38%) done
[Wed Feb 21 23:18:54 2024]
Finished job 3.
4 of 8 steps (50%) done
[Wed Feb 21 23:18:54 2024]
Finished job 2.
5 of 8 steps (62%) done
[Wed Feb 21 23:18:54 2024]
Finished job 6.
6 of 8 steps (75%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Feb 21 23:18:54 2024]
rule multiqc:
    input: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105
699_chr18_fastqc.zip
    output: multiqc_report.html, multiqc_data
    log: Logs/multiqc.std, Logs/multiqc.err
    jobid: 7
    reason: Input files updated by another job: FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR310
5697_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.zip
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, threads=1, mem=1Gb

multiqc FastQC/SRR3099586_chr18_fastqc.zip FastQC/SRR3105698_chr18_fastqc.zip FastQC/SRR3099587_chr18_fastqc.zip FastQC/SRR3099585_chr18_fastqc.zip FastQC/SRR3105697_chr18_fastqc.zip FastQC/SRR3105699_chr1
8_fastqc.zip 1>Logs/multiqc.std 2>Logs/multiqc.err
Submitted job 7 with external jobid '748796.pbsserver'.
[Wed Feb 21 23:19:04 2024]
Finished job 7.
7 of 8 steps (88%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Feb 21 23:19:04 2024]
localrule all:
    input: FastQC/SRR3099586_chr18_fastqc.html, FastQC/SRR3105698_chr18_fastqc.html, FastQC/SRR3099587_chr18_fastqc.html, FastQC/SRR3099585_chr18_fastqc.html, FastQC/SRR3105697_chr18_fastqc.html, FastQC/SR
R3105699_chr18_fastqc.html, multiqc_report.html
    jobid: 0
    reason: Input files updated by another job: FastQC/SRR3105699_chr18_fastqc.html, FastQC/SRR3105698_chr18_fastqc.html, FastQC/SRR3099585_chr18_fastqc.html, multiqc_report.html, FastQC/SRR3105697_chr18_f
astqc.html, FastQC/SRR3099587_chr18_fastqc.html, FastQC/SRR3099586_chr18_fastqc.html
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=/var/tmp/pbs.748722.pbsserver, threads=1, mem=1Gb

[Wed Feb 21 23:19:04 2024]
Finished job 0.
8 of 8 steps (100%) done
Complete log: .snakemake/log/2024-02-21T231834.549059.snakemake.log

				
			
Observe the output
As you can see in the log output, all jobs were run with the default resources set in our profile: “resources: threads=1, mem=1Gb“. In the next and last objective of this exercise, we’ll see how to specify different resources for each rule in the Snakefile.
Scroll to Top