Exercise 0 3 – BIOI2 – Integrative BIOInformatics platforme

Getting started with Snakemake

About this course | Before the session | About Snakemake | Course material | Exercises

Exercise 0 - run your first snakefile

Objective 3

Understanding Snakemake’s output.

Deciphering the output log

As a quick reminder, this is what Snakemake printed on our screen when we ran it:

				
					Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job            count
-----------  -------
fusionFasta        1
loadData           2
mafft              1
targets            1
total              5

Select jobs to execute...
Execute 1 jobs...

[Wed Feb 21 16:29:40 2024]
localrule loadData:
    output: fasta/P01308.fasta
    log: logs/P01308_wget.stdout, logs/P01308_wget.stderr
    jobid: 2
    reason: Missing output files: fasta/P01308.fasta
    wildcards: sample=P01308
    resources: tmpdir=/var/tmp/pbs.747800.pbsserver

[Wed Feb 21 16:29:42 2024]
Finished job 2.
1 of 5 steps (20%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Feb 21 16:29:42 2024]
localrule loadData:
    output: fasta/P10415.fasta
    log: logs/P10415_wget.stdout, logs/P10415_wget.stderr
    jobid: 1
    reason: Missing output files: fasta/P10415.fasta
    wildcards: sample=P10415
    resources: tmpdir=/var/tmp/pbs.747800.pbsserver

[Wed Feb 21 16:29:42 2024]
Finished job 1.
2 of 5 steps (40%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Feb 21 16:29:42 2024]
localrule fusionFasta:
    input: fasta/P10415.fasta, fasta/P01308.fasta
    output: fusionFasta/allSequences.fasta
    log: logs/fusionData.stderr
    jobid: 3
    reason: Missing output files: fusionFasta/allSequences.fasta; Input files updated by another job: fasta/P01308.fasta, fasta/P10415.fasta
    resources: tmpdir=/var/tmp/pbs.747800.pbsserver

[Wed Feb 21 16:29:42 2024]
Finished job 3.
3 of 5 steps (60%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Feb 21 16:29:42 2024]
localrule mafft:
    input: fusionFasta/allSequences.fasta
    output: mafft/mafft_res.fasta
    log: logs/whichMafft.txt
    jobid: 4
    reason: Missing output files: mafft/mafft_res.fasta; Input files updated by another job: fusionFasta/allSequences.fasta
    resources: tmpdir=/var/tmp/pbs.747800.pbsserver

[Wed Feb 21 16:29:43 2024]
Finished job 4.
4 of 5 steps (80%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Feb 21 16:29:43 2024]
localrule targets:
    input: fasta/P10415.fasta, fasta/P01308.fasta, fusionFasta/allSequences.fasta, mafft/mafft_res.fasta
    jobid: 0
    reason: Input files updated by another job: fasta/P01308.fasta, fasta/P10415.fasta, fusionFasta/allSequences.fasta, mafft/mafft_res.fasta
    resources: tmpdir=/var/tmp/pbs.747800.pbsserver

[Wed Feb 21 16:29:43 2024]
Finished job 0.
5 of 5 steps (100%) done
Complete log: .snakemake/log/2024-02-21T162939.800585.snakemake.log

Let’s zoom in on the various information it’s giving us…

1. The general picture with job stats:

Job stats:
job            count
-----------  -------
fusionFasta        1
loadData           2
mafft              1
targets            1
total              5

This table summarises all the rules that will be executed and how many times. Each execution is called a job. For example, we have 2 files to download with loadData, so 2 jobs for that rule. Both output files are fused with fusionFasta, which thus will be executed only once, etc.

2. Execution information for each job:

Select jobs to execute...
Execute 1 jobs...

[Wed Feb 21 16:29:40 2024]
localrule loadData:
    output: fasta/P01308.fasta
    log: logs/P01308_wget.stdout, logs/P01308_wget.stderr
    jobid: 2
    reason: Missing output files: fasta/P01308.fasta
    wildcards: sample=P01308
    resources: tmpdir=/var/tmp/pbs.747800.pbsserver

[Wed Feb 21 16:29:42 2024]
Finished job 2.
1 of 5 steps (20%) done

You’ll have a block of information, like this one, printed for each job Snakemake executes. In this case, it executes one job at a time (Execute 1 jobs...). The log above is for job with id 2 (jobid: 2). Snakemake tells you when it was executed and when it finished running. It also tells you rather explicitly why it was run: Missing output files: fasta/P01308.fasta.

3. The end:

5 of 5 steps (100%) done
Complete log: .snakemake/log/2024-02-21T162939.800585.snakemake.log

Throughout the log, Snakemake will give you an idea of the pipeline progression. At the end, it will tell you if everything was run correctly and indicate where you’ll be able to retrieve the file corresponding to the log that is printed on your screen.

About the .snakemake folder

.snakemake? What are you talking about?

.snakemake is a hidden folder in your working directory:

john.doe@node06:/data/work/I2BC/john.doe/snakemake_tutorial/snakemake_examples/exercise0$ ls -a
.  ..  .snakemake  Snakefile  fasta  fusionFasta  logs  mafft  readme_runSnake.txt

This folder is where Snakemake saves all the information on the various runs. It’s not really necessary to understand how Snakemake organises the information but just know that Snakemake stores everything efficiently and this is beneficial when you have to find out why your job ends in an error for example, or if you want to generate a summary report on your pipeline’s execution.