Getting started with Snakemake
Objective 4
Motivation
Have you noticed how Snakemake sometimes decides to re-run everything (although output files already exist) and sometimes not?
Snakemake always justifies its choices in the output log. Sometimes its because the files are missing, other times it might because the code has changed, etc.. For example:
[Tue Feb 20 15:35:31 2024]
localrule fastqc:
input: Data/SRR3105698_chr18.fastq.gz
output: FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.html
log: Logs/SRR3105698_chr18_fastqc.std, Logs/SRR3105698_chr18_fastqc.err
jobid: 2
reason: Code has changed since last execution
wildcards: sample=SRR3105698_chr18
resources: tmpdir=/var/tmp/pbs.743371.pbsserver
fastqc --outdir FastQC Data/SRR3105698_chr18.fastq.gz 1>Logs/SRR3105698_chr18_fastqc.std 2>Logs/SRR3105698_chr18_fastqc.err
Or:
[Tue Feb 20 15:06:06 2024]
localrule fastqc:
input: Data/SRR3105698_chr18.fastq.gz
output: FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.html
jobid: 2
reason: Missing output files: FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.html
wildcards: sample=SRR3105698_chr18
resources: tmpdir=/var/tmp/pbs.743371.pbsserver
fastqc --outdir FastQC Data/SRR3105698_chr18.fastq.gz
How to control re-running criteria?
--rerun-triggers
. By default, all triggers are used (code,input,mtime,params,software-env
), which guarantees that results are consistent with the workflow code and configuration. To revert to Snakemake’s behaviour before v.7.8.0, you can use --rerun-triggers mtime
. This option will tell Snakemake to only use modification time when determining whether a job should be executed or not. For example:
snakemake -s ex1b_o3.smk -c 1 -p --rerun-triggers mtime
Force re-run
If you rerun your snakemake command line now, without changing anything to the code (with or without the --rerun-trigger mtime
option), you should see a message from Snakemake telling you that nothing needs doing:
Building DAG of jobs...
Nothing to be done (all requested files are present and up to date).
– delete all output folders and results before re-running the Snakemake command
rm -rf FastQC multiqc*
snakemake -s ex1b_o3.smk -c 1 -p --configfile ex1.yml
– use Snakemake’s --forcerun
(-R
) or --forceall
(-F
) options when you run the Snakemake command. --forcerun
reruns a specific rule or input which you will have to specify in the command line. --forceall
forces everything to be re-run. For example:
snakemake -s ex1b_o3.smk -c 1 -p -R fastqc --configfile ex1.yml
snakemake -s ex1b_o3.smk -c 1 -p -R FastQC/SRR3099585_chr18_fastqc.zip --configfile ex1.yml
snakemake -s ex1b_o3.smk -c 1 -p -F --configfile ex1.yml