Getting started with Snakemake
Objective 3
Understanding Snakemake’s output.
Deciphering the output log
As a quick reminder, this is what Snakemake printed on our screen when we ran it:
Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count
----------- -------
fusionFasta 1
loadData 2
mafft 1
targets 1
total 5
Select jobs to execute...
Execute 1 jobs...
[Wed Feb 21 16:29:40 2024]
localrule loadData:
output: fasta/P01308.fasta
log: logs/P01308_wget.stdout, logs/P01308_wget.stderr
jobid: 2
reason: Missing output files: fasta/P01308.fasta
wildcards: sample=P01308
resources: tmpdir=/var/tmp/pbs.747800.pbsserver
[Wed Feb 21 16:29:42 2024]
Finished job 2.
1 of 5 steps (20%) done
Select jobs to execute...
Execute 1 jobs...
[Wed Feb 21 16:29:42 2024]
localrule loadData:
output: fasta/P10415.fasta
log: logs/P10415_wget.stdout, logs/P10415_wget.stderr
jobid: 1
reason: Missing output files: fasta/P10415.fasta
wildcards: sample=P10415
resources: tmpdir=/var/tmp/pbs.747800.pbsserver
[Wed Feb 21 16:29:42 2024]
Finished job 1.
2 of 5 steps (40%) done
Select jobs to execute...
Execute 1 jobs...
[Wed Feb 21 16:29:42 2024]
localrule fusionFasta:
input: fasta/P10415.fasta, fasta/P01308.fasta
output: fusionFasta/allSequences.fasta
log: logs/fusionData.stderr
jobid: 3
reason: Missing output files: fusionFasta/allSequences.fasta; Input files updated by another job: fasta/P01308.fasta, fasta/P10415.fasta
resources: tmpdir=/var/tmp/pbs.747800.pbsserver
[Wed Feb 21 16:29:42 2024]
Finished job 3.
3 of 5 steps (60%) done
Select jobs to execute...
Execute 1 jobs...
[Wed Feb 21 16:29:42 2024]
localrule mafft:
input: fusionFasta/allSequences.fasta
output: mafft/mafft_res.fasta
log: logs/whichMafft.txt
jobid: 4
reason: Missing output files: mafft/mafft_res.fasta; Input files updated by another job: fusionFasta/allSequences.fasta
resources: tmpdir=/var/tmp/pbs.747800.pbsserver
[Wed Feb 21 16:29:43 2024]
Finished job 4.
4 of 5 steps (80%) done
Select jobs to execute...
Execute 1 jobs...
[Wed Feb 21 16:29:43 2024]
localrule targets:
input: fasta/P10415.fasta, fasta/P01308.fasta, fusionFasta/allSequences.fasta, mafft/mafft_res.fasta
jobid: 0
reason: Input files updated by another job: fasta/P01308.fasta, fasta/P10415.fasta, fusionFasta/allSequences.fasta, mafft/mafft_res.fasta
resources: tmpdir=/var/tmp/pbs.747800.pbsserver
[Wed Feb 21 16:29:43 2024]
Finished job 0.
5 of 5 steps (100%) done
Complete log: .snakemake/log/2024-02-21T162939.800585.snakemake.log
Let’s zoom in on the various information it’s giving us…
1. The general picture with job stats:
Job stats: job count ----------- ------- fusionFasta 1 loadData 2 mafft 1 targets 1 total 5
This table summarises all the rules that will be executed and how many times. Each execution is called a job. For example, we have 2 files to download with loadData, so 2 jobs for that rule. Both output files are fused with fusionFasta, which thus will be executed only once, etc.
2. Execution information for each job:
Select jobs to execute... Execute 1 jobs... [Wed Feb 21 16:29:40 2024] localrule loadData: output: fasta/P01308.fasta log: logs/P01308_wget.stdout, logs/P01308_wget.stderr jobid: 2 reason: Missing output files: fasta/P01308.fasta wildcards: sample=P01308 resources: tmpdir=/var/tmp/pbs.747800.pbsserver [Wed Feb 21 16:29:42 2024] Finished job 2. 1 of 5 steps (20%) done
You’ll have a block of information, like this one, printed for each job Snakemake executes. In this case, it executes one job at a time (Execute 1 jobs...
). The log above is for job with id 2 (jobid: 2
). Snakemake tells you when it was executed and when it finished running. It also tells you rather explicitly why it was run: Missing output files: fasta/P01308.fasta
.
3. The end:
5 of 5 steps (100%) done Complete log: .snakemake/log/2024-02-21T162939.800585.snakemake.log
Throughout the log, Snakemake will give you an idea of the pipeline progression. At the end, it will tell you if everything was run correctly and indicate where you’ll be able to retrieve the file corresponding to the log that is printed on your screen.
About the .snakemake folder
.snakemake
? What are you talking about?
.snakemake
is a hidden folder in your working directory:
john.doe@node06:/data/work/I2BC/john.doe/snakemake_tutorial/snakemake_examples/exercise0$ ls -a
. .. .snakemake Snakefile fasta fusionFasta logs mafft readme_runSnake.txt
This folder is where Snakemake saves all the information on the various runs. It’s not really necessary to understand how Snakemake organises the information but just know that Snakemake stores everything efficiently and this is beneficial when you have to find out why your job ends in an error for example, or if you want to generate a summary report on your pipeline’s execution.