Exercise 0 2 – BIOI2 – Integrative BIOInformatics platforme

Getting started with Snakemake

About this course | Before the session | About Snakemake | Course material | Exercises

Exercise 0 - run your first snakefile

Objective 2

Understanding the Snakefile architechture.

Rule syntax

Have a quick look at the Snakefile you’ve just executed. You’ll see that it’s a fairly repetitive syntax. Basic snakefiles are just composed of a set of blocks, each block corresponds to a “rule” with a pre-defined set of inputs and outputs. Most commonly, rules consist of a name, input files, output files, and a shell command to generate the output from the input:

rule ruleName:
    input: "inputFile(s)"
    output: "outputFile(s)"
    shell: "commandLine(s)"

input:, output: and shell: are directives,

they’re the most common ones but others exist.

Ideally, you should have 1 rule = 1 task.

In our Snakefile, we have the following rules:

targets: a special rule which takes as input the output of all other rules (we’ll get back to this in Exercise 1)
loadData: runs wget to download the fasta sequence for all samples listed at the top of the file
fusionFasta: runs cat to fuse all individual fasta files into a single file
mafft: runs mafft to align the sequences within the fused file from rule fusionFasta

Don’t hesitate to come back to this file once you’ve gone through Exercise 1 to explore the different syntaxes that are in there.

Snakemake is inspired by Python

If you’re familiar with Python, you’ll have recognised plenty of Python-like bits in this Snakefile. Although you don’t have to know Python to understand Snakemake, it’s still a good thing to keep it in mind while you’re using it. If you have a look at more advanced code, you’ll see that you can transfer quite a bit of your Python skills into Snakemake (you can import packages for example, or create functions).

Also, keep in mind that, as for Python, Snakemake is sensitive to indents in the code.