Getting started with Snakemake
Objective 2
Understanding the Snakefile architechture.
Rule syntax
Have a quick look at the Snakefile
you’ve just executed. You’ll see that it’s a fairly repetitive syntax. Basic snakefiles are just composed of a set of blocks, each block corresponds to a “rule” with a pre-defined set of inputs and outputs. Most commonly, rules consist of a name, input files, output files, and a shell command to generate the output from the input:
rule ruleName: input: "inputFile(s)" output: "outputFile(s)" shell: "commandLine(s)"
input:
, output:
and shell:
are directives,
they’re the most common ones but others exist.
Ideally, you should have 1 rule = 1 task.
In our Snakefile, we have the following rules:
- targets: a special rule which takes as input the output of all other rules (we’ll get back to this in Exercise 1)
- loadData: runs
wget
to download the fasta sequence for all samples listed at the top of the file - fusionFasta: runs
cat
to fuse all individual fasta files into a single file - mafft: runs
mafft
to align the sequences within the fused file from rule fusionFasta
Don’t hesitate to come back to this file once you’ve gone through Exercise 1 to explore the different syntaxes that are in there.
Snakemake is inspired by Python
If you’re familiar with Python, you’ll have recognised plenty of Python-like bits in this Snakefile. Although you don’t have to know Python to understand Snakemake, it’s still a good thing to keep it in mind while you’re using it. If you have a look at more advanced code, you’ll see that you can transfer quite a bit of your Python skills into Snakemake (you can import packages for example, or create functions).
Also, keep in mind that, as for Python, Snakemake is sensitive to indents in the code.