Exercise 1C objective – BIOI2 – Integrative BIOInformatics platforme

Getting started with Snakemake

About this course | Before the session | About Snakemake | Course material | Exercises

Exercise 1C - optimising resource usage

objective > setup > o1 > o2 > o3 > o4 > o5 > recap

For this practical exercise, we will:

learn how to run Snakemake on multiple processors locally (o1)
learn how to communicate with the OpenPBS scheduler on the I2BC cluster (o2)
learn how to control the software environment (o3)
create an HPC profile file for Snakemake (o4)
learn how to set resources specific for each rule (o5)

In the following objectives, we will continue building on the Snakefile from Exercises 1A and 1B which successively runs FastQC then MultiQC on a set of RNA-seq data, and try to adapt it to an HPC environment, namely the I2BC’s cluster.

Motivation

Up until now, our workflow just runs on a single processor. Thus, each job is run sequentially, which takes time and is frustrating when you know that more resources are available on the cluster (=> using more processors reduces computation time in most cases).

There are two ways of scaling up your workflow:

run multiple jobs in parallel: if you have several inputs and each of them can be processed independently from each other by a specific rule (e.g. the fastqc rule in this Exercise), then you can run all of these jobs simultaneously instead of sequentially (=> 1 processor per job)
run steps multithreaded: if you’re using a tool in your rule that handles multithreading (e.g. it has an option like –threads for example), you could run this rule on more than one processor (=> several processors per job)

How this exercise is organised:

As for the previous exercises, each step will reply to an objective. Thus, we will be doing several cycles of executing snakemake, observing the results and improving the code. Each code version will be noted ex1c_oX.smk, with X a progressive digit corresponding to the objective number.

Warning: keep in mind that we’ll be using commands that are specific to the I2BC cluster’s scheduler system (they might be different on other clusters). Also, if you’re not familiar with these commands, don’t hesitate to have a look at the course material of the I2BC cluster training.