Getting started with Snakemake
Objective 1
Run the Snakemake workflow.
Where to start?
Nothing easier than that! Make sure you are in the right directory (the one that contains Snakefile
) and type the following.
cd $WORKDIR/snakemake_examples/exercise0/
snakemake --cores 1
The main file which houses Snakemake’s workflow is called Snakefile
by default and Snakemake will search for this file in your folder automatically. That’s why all you have to type to run the command is snakemake --cores 1
.
Observe the output
Congrats! You’ve run your first Snakemake workflow! As you can see, Snakemake is very talkative… You should see something similar to the following on your screen:
Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count
----------- -------
fusionFasta 1
loadData 2
mafft 1
targets 1
total 5
Select jobs to execute...
Execute 1 jobs...
[Wed Feb 21 16:29:40 2024]
localrule loadData:
output: fasta/P01308.fasta
log: logs/P01308_wget.stdout, logs/P01308_wget.stderr
jobid: 2
reason: Missing output files: fasta/P01308.fasta
wildcards: sample=P01308
resources: tmpdir=/var/tmp/pbs.747800.pbsserver
[Wed Feb 21 16:29:42 2024]
Finished job 2.
1 of 5 steps (20%) done
Select jobs to execute...
Execute 1 jobs...
[Wed Feb 21 16:29:42 2024]
localrule loadData:
output: fasta/P10415.fasta
log: logs/P10415_wget.stdout, logs/P10415_wget.stderr
jobid: 1
reason: Missing output files: fasta/P10415.fasta
wildcards: sample=P10415
resources: tmpdir=/var/tmp/pbs.747800.pbsserver
[Wed Feb 21 16:29:42 2024]
Finished job 1.
2 of 5 steps (40%) done
Select jobs to execute...
Execute 1 jobs...
[Wed Feb 21 16:29:42 2024]
localrule fusionFasta:
input: fasta/P10415.fasta, fasta/P01308.fasta
output: fusionFasta/allSequences.fasta
log: logs/fusionData.stderr
jobid: 3
reason: Missing output files: fusionFasta/allSequences.fasta; Input files updated by another job: fasta/P01308.fasta, fasta/P10415.fasta
resources: tmpdir=/var/tmp/pbs.747800.pbsserver
[Wed Feb 21 16:29:42 2024]
Finished job 3.
3 of 5 steps (60%) done
Select jobs to execute...
Execute 1 jobs...
[Wed Feb 21 16:29:42 2024]
localrule mafft:
input: fusionFasta/allSequences.fasta
output: mafft/mafft_res.fasta
log: logs/whichMafft.txt
jobid: 4
reason: Missing output files: mafft/mafft_res.fasta; Input files updated by another job: fusionFasta/allSequences.fasta
resources: tmpdir=/var/tmp/pbs.747800.pbsserver
[Wed Feb 21 16:29:43 2024]
Finished job 4.
4 of 5 steps (80%) done
Select jobs to execute...
Execute 1 jobs...
[Wed Feb 21 16:29:43 2024]
localrule targets:
input: fasta/P10415.fasta, fasta/P01308.fasta, fusionFasta/allSequences.fasta, mafft/mafft_res.fasta
jobid: 0
reason: Input files updated by another job: fasta/P01308.fasta, fasta/P10415.fasta, fusionFasta/allSequences.fasta, mafft/mafft_res.fasta
resources: tmpdir=/var/tmp/pbs.747800.pbsserver
[Wed Feb 21 16:29:43 2024]
Finished job 0.
5 of 5 steps (100%) done
Complete log: .snakemake/log/2024-02-21T162939.800585.snakemake.log
Let’s also have a quick look at your working directory. You should see plenty of new files and folders in there that were generated by Snakemake:
john.doe@node06:/data/work/I2BC/john.doe/snakemake_tutorial/snakemake_examples/exercise0$ ls -a
. .. .snakemake Snakefile fasta fusionFasta logs mafft readme_runSnake.txt
As a reminder, this workflow downloads the fasta sequences of 2 proteins (P01325 and P01308) in a folder called fasta
(if you have a look in this folder, you’ll see that the fasta files are in there). As a second step, it creates a fusion fasta file in the fusionFasta
folder, and then aligns the sequences within this fusion file using mafft (you should see the mafft_res.fasta
file in the mafft
directory).
Ok, let’s go through all of this together step by step…