Exercise 1B 3 – BIOI2 – Integrative BIOInformatics platforme

Getting started with Snakemake

About this course | Before the session | About Snakemake | Course material | Exercises

Exercise 1B - improving your snakefile

objective > setup > o1 > o2 > o3 > o4 > recap

Objective 3

Create a new snakefile named ex1b_o3.smk in which we redirect the standard output and error streams to log files.

Where to start?

About stdout/stderr: In Unix systems, the output of a command is usually sent to 2 separate streams: the expected output to Standard Out (stdout, or “>” or “1>”), and the error messages to Standard Error (stderr, or “2>”).
Add the log directive: redirect the stdout and stderr streams of the fastqc and multiqc rules to a file by adding a “log:” directive (similar to the already existing input: or output: directives) with two variables, out and err, to separately redirect each stream.
Adapt the shell commands: add stdout and stderr redirections using 1> stdout.txt and 2> stderr.txt in the shell command lines of your rules. Use wildcards to specify the chosen file names (e.g. “1>{log.std} 2>{log.err}“).

Solution

Your code for ex1b_o3.smk should look like this:

SAMPLES, = glob_wildcards(config["dataDir"]+"/{sample}.fastq.gz")

rule all:
  input:
    expand("FastQC/{sample}_fastqc.html", sample=SAMPLES),
    "multiqc_report.html"

rule fastqc:
  input:
    config["dataDir"]+"/{sample}.fastq.gz"
  output:
    "FastQC/{sample}_fastqc.zip",
    "FastQC/{sample}_fastqc.html"
  log:
    "Logs/{sample}_fastqc.std",
    "Logs/{sample}_fastqc.err"
  shell: "fastqc --outdir FastQC {input} 1>{log[0]} 2>{log[1]}"

rule multiqc:
  input:
    expand("FastQC/{sample}_fastqc.zip", sample = SAMPLES)
  output:
    "multiqc_report.html",
    directory("multiqc_data")
  log:
    std="Logs/multiqc.std",
    err="Logs/multiqc.err"
  shell: "multiqc {input} 1>{log.std} 2>{log.err}"

As you can see, we specify the log files differently in the fastqc rule and in the multiqc rule (for demonstration reasons). In the multiqc rule, both log files are named (“std” and “err”) and are used in the shell directive like so: “{log.std}” and “{log.err}“. In the fastqc rule, we don’t specify names and use them in the shell directive with Python’s list syntax instead: “{log[0]}” and “{log[1]}“.

Test the script

Next, let’s check if your pipeline works as expected:

				
					snakemake -s ex1b_o3.smk -c 1 -p --configfile ex1.yml

You should see something similar to the following output on your screen.

				
					Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job        count
-------  -------
all            1
fastqc         6
multiqc        1
total          8

Select jobs to execute...
Execute 1 jobs...

[Tue Feb 20 15:34:52 2024]
localrule fastqc:
    input: Data/SRR3099587_chr18.fastq.gz
    output: FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.html
    log: Logs/SRR3099587_chr18_fastqc.std, Logs/SRR3099587_chr18_fastqc.err
    jobid: 3
    reason: Code has changed since last execution
    wildcards: sample=SRR3099587_chr18
    resources: tmpdir=/var/tmp/pbs.743371.pbsserver

fastqc --outdir FastQC Data/SRR3099587_chr18.fastq.gz 1>Logs/SRR3099587_chr18_fastqc.std 2>Logs/SRR3099587_chr18_fastqc.err
[Tue Feb 20 15:34:59 2024]
Finished job 3.
1 of 8 steps (12%) done
Select jobs to execute...
Execute 1 jobs...

[Tue Feb 20 15:34:59 2024]
localrule fastqc:
    input: Data/SRR3099585_chr18.fastq.gz
    output: FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.html
    log: Logs/SRR3099585_chr18_fastqc.std, Logs/SRR3099585_chr18_fastqc.err
    jobid: 4
    reason: Code has changed since last execution
    wildcards: sample=SRR3099585_chr18
    resources: tmpdir=/var/tmp/pbs.743371.pbsserver

fastqc --outdir FastQC Data/SRR3099585_chr18.fastq.gz 1>Logs/SRR3099585_chr18_fastqc.std 2>Logs/SRR3099585_chr18_fastqc.err
[Tue Feb 20 15:35:08 2024]
Finished job 4.
2 of 8 steps (25%) done
Select jobs to execute...
Execute 1 jobs...

[Tue Feb 20 15:35:08 2024]
localrule fastqc:
    input: Data/SRR3105697_chr18.fastq.gz
    output: FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.html
    log: Logs/SRR3105697_chr18_fastqc.std, Logs/SRR3105697_chr18_fastqc.err
    jobid: 5
    reason: Code has changed since last execution
    wildcards: sample=SRR3105697_chr18
    resources: tmpdir=/var/tmp/pbs.743371.pbsserver

fastqc --outdir FastQC Data/SRR3105697_chr18.fastq.gz 1>Logs/SRR3105697_chr18_fastqc.std 2>Logs/SRR3105697_chr18_fastqc.err
[Tue Feb 20 15:35:16 2024]
Finished job 5.
3 of 8 steps (38%) done
Select jobs to execute...
Execute 1 jobs...

[Tue Feb 20 15:35:16 2024]
localrule fastqc:
    input: Data/SRR3099586_chr18.fastq.gz
    output: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.html
    log: Logs/SRR3099586_chr18_fastqc.std, Logs/SRR3099586_chr18_fastqc.err
    jobid: 1
    reason: Code has changed since last execution
    wildcards: sample=SRR3099586_chr18
    resources: tmpdir=/var/tmp/pbs.743371.pbsserver

fastqc --outdir FastQC Data/SRR3099586_chr18.fastq.gz 1>Logs/SRR3099586_chr18_fastqc.std 2>Logs/SRR3099586_chr18_fastqc.err
[Tue Feb 20 15:35:23 2024]
Finished job 1.
4 of 8 steps (50%) done
Select jobs to execute...
Execute 1 jobs...

[Tue Feb 20 15:35:23 2024]
localrule fastqc:
    input: Data/SRR3105699_chr18.fastq.gz
    output: FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR3105699_chr18_fastqc.html
    log: Logs/SRR3105699_chr18_fastqc.std, Logs/SRR3105699_chr18_fastqc.err
    jobid: 6
    reason: Code has changed since last execution
    wildcards: sample=SRR3105699_chr18
    resources: tmpdir=/var/tmp/pbs.743371.pbsserver

fastqc --outdir FastQC Data/SRR3105699_chr18.fastq.gz 1>Logs/SRR3105699_chr18_fastqc.std 2>Logs/SRR3105699_chr18_fastqc.err
[Tue Feb 20 15:35:31 2024]
Finished job 6.
5 of 8 steps (62%) done
Select jobs to execute...
Execute 1 jobs...

[Tue Feb 20 15:35:31 2024]
localrule fastqc:
    input: Data/SRR3105698_chr18.fastq.gz
    output: FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.html
    log: Logs/SRR3105698_chr18_fastqc.std, Logs/SRR3105698_chr18_fastqc.err
    jobid: 2
    reason: Code has changed since last execution
    wildcards: sample=SRR3105698_chr18
    resources: tmpdir=/var/tmp/pbs.743371.pbsserver

fastqc --outdir FastQC Data/SRR3105698_chr18.fastq.gz 1>Logs/SRR3105698_chr18_fastqc.std 2>Logs/SRR3105698_chr18_fastqc.err
[Tue Feb 20 15:35:38 2024]
Finished job 2.
6 of 8 steps (75%) done
Select jobs to execute...
Execute 1 jobs...

[Tue Feb 20 15:35:38 2024]
localrule multiqc:
    input: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105
699_chr18_fastqc.zip
    output: multiqc_report.html, multiqc_data
    log: Logs/multiqc.std, Logs/multiqc.err
    jobid: 7
    reason: Input files updated by another job: FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR309
9586_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.zip
    resources: tmpdir=/var/tmp/pbs.743371.pbsserver

multiqc FastQC/SRR3099586_chr18_fastqc.zip FastQC/SRR3105698_chr18_fastqc.zip FastQC/SRR3099587_chr18_fastqc.zip FastQC/SRR3099585_chr18_fastqc.zip FastQC/SRR3105697_chr18_fastqc.zip FastQC/SRR3105699_chr1
8_fastqc.zip 1>Logs/multiqc.std 2>Logs/multiqc.err
[Tue Feb 20 15:35:43 2024]
Finished job 7.
7 of 8 steps (88%) done
Select jobs to execute...
Execute 1 jobs...

[Tue Feb 20 15:35:43 2024]
localrule all:
    input: FastQC/SRR3099586_chr18_fastqc.html, FastQC/SRR3105698_chr18_fastqc.html, FastQC/SRR3099587_chr18_fastqc.html, FastQC/SRR3099585_chr18_fastqc.html, FastQC/SRR3105697_chr18_fastqc.html, FastQC/SR
R3105699_chr18_fastqc.html, multiqc_report.html
    jobid: 0
    reason: Input files updated by another job: FastQC/SRR3105699_chr18_fastqc.html, FastQC/SRR3099587_chr18_fastqc.html, FastQC/SRR3105697_chr18_fastqc.html, FastQC/SRR3099585_chr18_fastqc.html, multiqc_r
eport.html, FastQC/SRR3105698_chr18_fastqc.html, FastQC/SRR3099586_chr18_fastqc.html
    resources: tmpdir=/var/tmp/pbs.743371.pbsserver

[Tue Feb 20 15:35:43 2024]
Finished job 0.
8 of 8 steps (100%) done
Complete log: .snakemake/log/2024-02-20T153451.526914.snakemake.log

Observe the output

As you can see, the fastqc steps don’t generate as much text as before. Also, if you have a look at your working directory, you should see a Logs folder in there now, containing all the individual logs of your input files and rules:

				
					john.doe@cluster-i2bc:/data/work/I2BC/john.doe/snakemake_tutorial$ ls Logs/
multiqc.err                  SRR3099586_chr18_fastqc.err  SRR3105697_chr18_fastqc.err  SRR3105699_chr18_fastqc.err
multiqc.std                  SRR3099586_chr18_fastqc.std  SRR3105697_chr18_fastqc.std  SRR3105699_chr18_fastqc.std
SRR3099585_chr18_fastqc.err  SRR3099587_chr18_fastqc.err  SRR3105698_chr18_fastqc.err
SRR3099585_chr18_fastqc.std  SRR3099587_chr18_fastqc.std  SRR3105698_chr18_fastqc.std