Exercise 0 1 – BIOI2 – Integrative BIOInformatics platforme

Getting started with Snakemake

About this course | Before the session | About Snakemake | Course material | Exercises

Exercise 0 - run your first snakefile

Objective 1

Run the Snakemake workflow.

Where to start?

Nothing easier than that! Make sure you are in the right directory (the one that contains Snakefile) and type the following.

				
					cd $WORKDIR/snakemake_examples/exercise0/
snakemake --cores 1

The main file which houses Snakemake’s workflow is called Snakefile by default and Snakemake will search for this file in your folder automatically. That’s why all you have to type to run the command is snakemake --cores 1.

Observe the output

Congrats! You’ve run your first Snakemake workflow! As you can see, Snakemake is very talkative… You should see something similar to the following on your screen:

				
					Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job            count
-----------  -------
fusionFasta        1
loadData           2
mafft              1
targets            1
total              5

Select jobs to execute...
Execute 1 jobs...

[Wed Feb 21 16:29:40 2024]
localrule loadData:
    output: fasta/P01308.fasta
    log: logs/P01308_wget.stdout, logs/P01308_wget.stderr
    jobid: 2
    reason: Missing output files: fasta/P01308.fasta
    wildcards: sample=P01308
    resources: tmpdir=/var/tmp/pbs.747800.pbsserver

[Wed Feb 21 16:29:42 2024]
Finished job 2.
1 of 5 steps (20%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Feb 21 16:29:42 2024]
localrule loadData:
    output: fasta/P10415.fasta
    log: logs/P10415_wget.stdout, logs/P10415_wget.stderr
    jobid: 1
    reason: Missing output files: fasta/P10415.fasta
    wildcards: sample=P10415
    resources: tmpdir=/var/tmp/pbs.747800.pbsserver

[Wed Feb 21 16:29:42 2024]
Finished job 1.
2 of 5 steps (40%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Feb 21 16:29:42 2024]
localrule fusionFasta:
    input: fasta/P10415.fasta, fasta/P01308.fasta
    output: fusionFasta/allSequences.fasta
    log: logs/fusionData.stderr
    jobid: 3
    reason: Missing output files: fusionFasta/allSequences.fasta; Input files updated by another job: fasta/P01308.fasta, fasta/P10415.fasta
    resources: tmpdir=/var/tmp/pbs.747800.pbsserver

[Wed Feb 21 16:29:42 2024]
Finished job 3.
3 of 5 steps (60%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Feb 21 16:29:42 2024]
localrule mafft:
    input: fusionFasta/allSequences.fasta
    output: mafft/mafft_res.fasta
    log: logs/whichMafft.txt
    jobid: 4
    reason: Missing output files: mafft/mafft_res.fasta; Input files updated by another job: fusionFasta/allSequences.fasta
    resources: tmpdir=/var/tmp/pbs.747800.pbsserver

[Wed Feb 21 16:29:43 2024]
Finished job 4.
4 of 5 steps (80%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Feb 21 16:29:43 2024]
localrule targets:
    input: fasta/P10415.fasta, fasta/P01308.fasta, fusionFasta/allSequences.fasta, mafft/mafft_res.fasta
    jobid: 0
    reason: Input files updated by another job: fasta/P01308.fasta, fasta/P10415.fasta, fusionFasta/allSequences.fasta, mafft/mafft_res.fasta
    resources: tmpdir=/var/tmp/pbs.747800.pbsserver

[Wed Feb 21 16:29:43 2024]
Finished job 0.
5 of 5 steps (100%) done
Complete log: .snakemake/log/2024-02-21T162939.800585.snakemake.log

Let’s also have a quick look at your working directory. You should see plenty of new files and folders in there that were generated by Snakemake:

				
					john.doe@node06:/data/work/I2BC/john.doe/snakemake_tutorial/snakemake_examples/exercise0$ ls -a
.  ..  .snakemake  Snakefile  fasta  fusionFasta  logs  mafft  readme_runSnake.txt

As a reminder, this workflow downloads the fasta sequences of 2 proteins (P01325 and P01308) in a folder called fasta (if you have a look in this folder, you’ll see that the fasta files are in there). As a second step, it creates a fusion fasta file in the fusionFasta folder, and then aligns the sequences within this fusion file using mafft (you should see the mafft_res.fasta file in the mafft directory).

Ok, let’s go through all of this together step by step…