Getting started with Snakemake
Objective 4:
Where to start?
To avoid typing cluster-specific and basic options such as -p
in the command line every time you run a Snakefile on the I2BC cluster, we can add them all to a profile file instead and then forget about them. We’ll call this file config.yaml
and we’ll put it in your home directory in $HOME/.config/snakemake/pbs/
. You might need to create the directory first:
mkdir -p $HOME/.config/snakemake/pbs/
Inside this file ($HOME/.config/snakemake/pbs/config.yaml
), we’ll put all the options we routinely use to run Snakemake as well as those we use specifically in conjunction with OpenPBS:
# cluster-specific options:
jobs: 6
executor: cluster-generic
cluster-generic-submit-cmd: "qsub -l ncpus={threads} -l mem={resources.mem} -l walltime={resources.time_min}"
cluster-generic-cancel-cmd: "qdel"
# software option:
software-deployment-method: env-modules
# to avoid typing -p everytime:
printshellcmds: True
# set default resources for each job to 1 cpu and 1Gb if not specified otherwise:
default-resources: [threads=1, mem="1Gb", time_min="02:00:00"]
Since we’re creating a “general” profile, we have to be able to adjust the parameters given to the qsub command via the Snakefile. Thus, we can “generalise” the submission command with wildcards ({thread}
or {resources.mem_mb}
).
Run you Snakefile
--profile pbs
):
snakemake -s ex1c_o3.smk --configfile ex1.yml -R fastqc --profile pbs
--profile pbs
? You can either specify a file path (directory in which the profile is) or just a profile name (the name is given by the parent directory in which the profile is saved). Snakemake will automatically look in the default places where the profile could be stored (including the directory in which we just placed ours).
Using profile pbs for setting default command line arguments.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided remote nodes: 6
Job stats:
job count
------- -------
all 1
fastqc 6
multiqc 1
total 8
Select jobs to execute...
Execute 6 jobs...
[Wed Feb 21 23:18:34 2024]
rule fastqc:
input: Data/SRR3099585_chr18.fastq.gz
output: FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.html
log: Logs/SRR3099585_chr18_fastqc.std, Logs/SRR3099585_chr18_fastqc.err
jobid: 4
reason: Forced execution
wildcards: sample=SRR3099585_chr18
resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=, threads=1, mem=1Gb
fastqc --outdir FastQC Data/SRR3099585_chr18.fastq.gz 1>Logs/SRR3099585_chr18_fastqc.std 2>Logs/SRR3099585_chr18_fastqc.err
Submitted job 4 with external jobid '748790.pbsserver'.
[Wed Feb 21 23:18:35 2024]
rule fastqc:
input: Data/SRR3105697_chr18.fastq.gz
output: FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.html
log: Logs/SRR3105697_chr18_fastqc.std, Logs/SRR3105697_chr18_fastqc.err
jobid: 5
reason: Forced execution
wildcards: sample=SRR3105697_chr18
resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=, threads=1, mem=1Gb
fastqc --outdir FastQC Data/SRR3105697_chr18.fastq.gz 1>Logs/SRR3105697_chr18_fastqc.std 2>Logs/SRR3105697_chr18_fastqc.err
Submitted job 5 with external jobid '748791.pbsserver'.
[Wed Feb 21 23:18:35 2024]
rule fastqc:
input: Data/SRR3099586_chr18.fastq.gz
output: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.html
log: Logs/SRR3099586_chr18_fastqc.std, Logs/SRR3099586_chr18_fastqc.err
jobid: 1
reason: Forced execution
wildcards: sample=SRR3099586_chr18
resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=, threads=1, mem=1Gb
fastqc --outdir FastQC Data/SRR3099586_chr18.fastq.gz 1>Logs/SRR3099586_chr18_fastqc.std 2>Logs/SRR3099586_chr18_fastqc.err
Submitted job 1 with external jobid '748792.pbsserver'.
[Wed Feb 21 23:18:35 2024]
rule fastqc:
input: Data/SRR3099587_chr18.fastq.gz
output: FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.html
log: Logs/SRR3099587_chr18_fastqc.std, Logs/SRR3099587_chr18_fastqc.err
jobid: 3
reason: Forced execution
wildcards: sample=SRR3099587_chr18
resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=, threads=1, mem=1Gb
fastqc --outdir FastQC Data/SRR3099587_chr18.fastq.gz 1>Logs/SRR3099587_chr18_fastqc.std 2>Logs/SRR3099587_chr18_fastqc.err
Submitted job 3 with external jobid '748793.pbsserver'.
[Wed Feb 21 23:18:35 2024]
rule fastqc:
input: Data/SRR3105698_chr18.fastq.gz
output: FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.html
log: Logs/SRR3105698_chr18_fastqc.std, Logs/SRR3105698_chr18_fastqc.err
jobid: 2
reason: Forced execution
wildcards: sample=SRR3105698_chr18
resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=, threads=1, mem=1Gb
fastqc --outdir FastQC Data/SRR3105698_chr18.fastq.gz 1>Logs/SRR3105698_chr18_fastqc.std 2>Logs/SRR3105698_chr18_fastqc.err
Submitted job 2 with external jobid '748794.pbsserver'.
[Wed Feb 21 23:18:35 2024]
rule fastqc:
input: Data/SRR3105699_chr18.fastq.gz
output: FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR3105699_chr18_fastqc.html
log: Logs/SRR3105699_chr18_fastqc.std, Logs/SRR3105699_chr18_fastqc.err
jobid: 6
reason: Forced execution
wildcards: sample=SRR3105699_chr18
resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=, threads=1, mem=1Gb
fastqc --outdir FastQC Data/SRR3105699_chr18.fastq.gz 1>Logs/SRR3105699_chr18_fastqc.std 2>Logs/SRR3105699_chr18_fastqc.err
Submitted job 6 with external jobid '748795.pbsserver'.
[Wed Feb 21 23:18:54 2024]
Finished job 4.
1 of 8 steps (12%) done
[Wed Feb 21 23:18:54 2024]
Finished job 5.
2 of 8 steps (25%) done
[Wed Feb 21 23:18:54 2024]
Finished job 1.
3 of 8 steps (38%) done
[Wed Feb 21 23:18:54 2024]
Finished job 3.
4 of 8 steps (50%) done
[Wed Feb 21 23:18:54 2024]
Finished job 2.
5 of 8 steps (62%) done
[Wed Feb 21 23:18:54 2024]
Finished job 6.
6 of 8 steps (75%) done
Select jobs to execute...
Execute 1 jobs...
[Wed Feb 21 23:18:54 2024]
rule multiqc:
input: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105
699_chr18_fastqc.zip
output: multiqc_report.html, multiqc_data
log: Logs/multiqc.std, Logs/multiqc.err
jobid: 7
reason: Input files updated by another job: FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR310
5697_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.zip
resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=, threads=1, mem=1Gb
multiqc FastQC/SRR3099586_chr18_fastqc.zip FastQC/SRR3105698_chr18_fastqc.zip FastQC/SRR3099587_chr18_fastqc.zip FastQC/SRR3099585_chr18_fastqc.zip FastQC/SRR3105697_chr18_fastqc.zip FastQC/SRR3105699_chr1
8_fastqc.zip 1>Logs/multiqc.std 2>Logs/multiqc.err
Submitted job 7 with external jobid '748796.pbsserver'.
[Wed Feb 21 23:19:04 2024]
Finished job 7.
7 of 8 steps (88%) done
Select jobs to execute...
Execute 1 jobs...
[Wed Feb 21 23:19:04 2024]
localrule all:
input: FastQC/SRR3099586_chr18_fastqc.html, FastQC/SRR3105698_chr18_fastqc.html, FastQC/SRR3099587_chr18_fastqc.html, FastQC/SRR3099585_chr18_fastqc.html, FastQC/SRR3105697_chr18_fastqc.html, FastQC/SR
R3105699_chr18_fastqc.html, multiqc_report.html
jobid: 0
reason: Input files updated by another job: FastQC/SRR3105699_chr18_fastqc.html, FastQC/SRR3105698_chr18_fastqc.html, FastQC/SRR3099585_chr18_fastqc.html, multiqc_report.html, FastQC/SRR3105697_chr18_f
astqc.html, FastQC/SRR3099587_chr18_fastqc.html, FastQC/SRR3099586_chr18_fastqc.html
resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=/var/tmp/pbs.748722.pbsserver, threads=1, mem=1Gb
[Wed Feb 21 23:19:04 2024]
Finished job 0.
8 of 8 steps (100%) done
Complete log: .snakemake/log/2024-02-21T231834.549059.snakemake.log
Observe the output
resources: threads=1, mem=1Gb
“. In the next and last objective of this exercise, we’ll see how to specify different resources for each rule in the Snakefile.