Getting started with Snakemake
Objective 3
Create a new snakefile named
ex1b_o3.smk
in which we redirect the standard output and error streams to log files.Where to start?
- About stdout/stderr: In Unix systems, the output of a command is usually sent to 2 separate streams: the expected output to Standard Out (stdout, or “>” or “1>”), and the error messages to Standard Error (stderr, or “2>”).
- Add the log directive: redirect the stdout and stderr streams of the fastqc and multiqc rules to a file by adding a “
log:
” directive (similar to the already existinginput:
oroutput:
directives) with two variables,out
anderr
, to separately redirect each stream. - Adapt the shell commands: add stdout and stderr redirections using
1> stdout.txt
and2> stderr.txt
in the shell command lines of your rules. Use wildcards to specify the chosen file names (e.g. “1>{log.std} 2>{log.err}
“).
Your code for
ex1b_o3.smk
should look like this:
SAMPLES, = glob_wildcards(config["dataDir"]+"/{sample}.fastq.gz") rule all: input: expand("FastQC/{sample}_fastqc.html", sample=SAMPLES), "multiqc_report.html" rule fastqc: input: config["dataDir"]+"/{sample}.fastq.gz" output: "FastQC/{sample}_fastqc.zip", "FastQC/{sample}_fastqc.html" log: "Logs/{sample}_fastqc.std", "Logs/{sample}_fastqc.err" shell: "fastqc --outdir FastQC {input} 1>{log[0]} 2>{log[1]}" rule multiqc: input: expand("FastQC/{sample}_fastqc.zip", sample = SAMPLES) output: "multiqc_report.html", directory("multiqc_data") log: std="Logs/multiqc.std", err="Logs/multiqc.err" shell: "multiqc {input} 1>{log.std} 2>{log.err}"As you can see, we specify the log files differently in the fastqc rule and in the multiqc rule (for demonstration reasons). In the multiqc rule, both log files are named (“std” and “err”) and are used in the shell directive like so: “
{log.std}
” and “{log.err}
“. In the fastqc rule, we don’t specify names and use them in the shell directive with Python’s list syntax instead: “{log[0]}
” and “{log[1]}
“.Test the script
Next, let’s check if your pipeline works as expected:
snakemake -s ex1b_o3.smk -c 1 -p --configfile ex1.yml
You should see something similar to the following output on your screen.
Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count
------- -------
all 1
fastqc 6
multiqc 1
total 8
Select jobs to execute...
Execute 1 jobs...
[Tue Feb 20 15:34:52 2024]
localrule fastqc:
input: Data/SRR3099587_chr18.fastq.gz
output: FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.html
log: Logs/SRR3099587_chr18_fastqc.std, Logs/SRR3099587_chr18_fastqc.err
jobid: 3
reason: Code has changed since last execution
wildcards: sample=SRR3099587_chr18
resources: tmpdir=/var/tmp/pbs.743371.pbsserver
fastqc --outdir FastQC Data/SRR3099587_chr18.fastq.gz 1>Logs/SRR3099587_chr18_fastqc.std 2>Logs/SRR3099587_chr18_fastqc.err
[Tue Feb 20 15:34:59 2024]
Finished job 3.
1 of 8 steps (12%) done
Select jobs to execute...
Execute 1 jobs...
[Tue Feb 20 15:34:59 2024]
localrule fastqc:
input: Data/SRR3099585_chr18.fastq.gz
output: FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.html
log: Logs/SRR3099585_chr18_fastqc.std, Logs/SRR3099585_chr18_fastqc.err
jobid: 4
reason: Code has changed since last execution
wildcards: sample=SRR3099585_chr18
resources: tmpdir=/var/tmp/pbs.743371.pbsserver
fastqc --outdir FastQC Data/SRR3099585_chr18.fastq.gz 1>Logs/SRR3099585_chr18_fastqc.std 2>Logs/SRR3099585_chr18_fastqc.err
[Tue Feb 20 15:35:08 2024]
Finished job 4.
2 of 8 steps (25%) done
Select jobs to execute...
Execute 1 jobs...
[Tue Feb 20 15:35:08 2024]
localrule fastqc:
input: Data/SRR3105697_chr18.fastq.gz
output: FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.html
log: Logs/SRR3105697_chr18_fastqc.std, Logs/SRR3105697_chr18_fastqc.err
jobid: 5
reason: Code has changed since last execution
wildcards: sample=SRR3105697_chr18
resources: tmpdir=/var/tmp/pbs.743371.pbsserver
fastqc --outdir FastQC Data/SRR3105697_chr18.fastq.gz 1>Logs/SRR3105697_chr18_fastqc.std 2>Logs/SRR3105697_chr18_fastqc.err
[Tue Feb 20 15:35:16 2024]
Finished job 5.
3 of 8 steps (38%) done
Select jobs to execute...
Execute 1 jobs...
[Tue Feb 20 15:35:16 2024]
localrule fastqc:
input: Data/SRR3099586_chr18.fastq.gz
output: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.html
log: Logs/SRR3099586_chr18_fastqc.std, Logs/SRR3099586_chr18_fastqc.err
jobid: 1
reason: Code has changed since last execution
wildcards: sample=SRR3099586_chr18
resources: tmpdir=/var/tmp/pbs.743371.pbsserver
fastqc --outdir FastQC Data/SRR3099586_chr18.fastq.gz 1>Logs/SRR3099586_chr18_fastqc.std 2>Logs/SRR3099586_chr18_fastqc.err
[Tue Feb 20 15:35:23 2024]
Finished job 1.
4 of 8 steps (50%) done
Select jobs to execute...
Execute 1 jobs...
[Tue Feb 20 15:35:23 2024]
localrule fastqc:
input: Data/SRR3105699_chr18.fastq.gz
output: FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR3105699_chr18_fastqc.html
log: Logs/SRR3105699_chr18_fastqc.std, Logs/SRR3105699_chr18_fastqc.err
jobid: 6
reason: Code has changed since last execution
wildcards: sample=SRR3105699_chr18
resources: tmpdir=/var/tmp/pbs.743371.pbsserver
fastqc --outdir FastQC Data/SRR3105699_chr18.fastq.gz 1>Logs/SRR3105699_chr18_fastqc.std 2>Logs/SRR3105699_chr18_fastqc.err
[Tue Feb 20 15:35:31 2024]
Finished job 6.
5 of 8 steps (62%) done
Select jobs to execute...
Execute 1 jobs...
[Tue Feb 20 15:35:31 2024]
localrule fastqc:
input: Data/SRR3105698_chr18.fastq.gz
output: FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.html
log: Logs/SRR3105698_chr18_fastqc.std, Logs/SRR3105698_chr18_fastqc.err
jobid: 2
reason: Code has changed since last execution
wildcards: sample=SRR3105698_chr18
resources: tmpdir=/var/tmp/pbs.743371.pbsserver
fastqc --outdir FastQC Data/SRR3105698_chr18.fastq.gz 1>Logs/SRR3105698_chr18_fastqc.std 2>Logs/SRR3105698_chr18_fastqc.err
[Tue Feb 20 15:35:38 2024]
Finished job 2.
6 of 8 steps (75%) done
Select jobs to execute...
Execute 1 jobs...
[Tue Feb 20 15:35:38 2024]
localrule multiqc:
input: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105
699_chr18_fastqc.zip
output: multiqc_report.html, multiqc_data
log: Logs/multiqc.std, Logs/multiqc.err
jobid: 7
reason: Input files updated by another job: FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR309
9586_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.zip
resources: tmpdir=/var/tmp/pbs.743371.pbsserver
multiqc FastQC/SRR3099586_chr18_fastqc.zip FastQC/SRR3105698_chr18_fastqc.zip FastQC/SRR3099587_chr18_fastqc.zip FastQC/SRR3099585_chr18_fastqc.zip FastQC/SRR3105697_chr18_fastqc.zip FastQC/SRR3105699_chr1
8_fastqc.zip 1>Logs/multiqc.std 2>Logs/multiqc.err
[Tue Feb 20 15:35:43 2024]
Finished job 7.
7 of 8 steps (88%) done
Select jobs to execute...
Execute 1 jobs...
[Tue Feb 20 15:35:43 2024]
localrule all:
input: FastQC/SRR3099586_chr18_fastqc.html, FastQC/SRR3105698_chr18_fastqc.html, FastQC/SRR3099587_chr18_fastqc.html, FastQC/SRR3099585_chr18_fastqc.html, FastQC/SRR3105697_chr18_fastqc.html, FastQC/SR
R3105699_chr18_fastqc.html, multiqc_report.html
jobid: 0
reason: Input files updated by another job: FastQC/SRR3105699_chr18_fastqc.html, FastQC/SRR3099587_chr18_fastqc.html, FastQC/SRR3105697_chr18_fastqc.html, FastQC/SRR3099585_chr18_fastqc.html, multiqc_r
eport.html, FastQC/SRR3105698_chr18_fastqc.html, FastQC/SRR3099586_chr18_fastqc.html
resources: tmpdir=/var/tmp/pbs.743371.pbsserver
[Tue Feb 20 15:35:43 2024]
Finished job 0.
8 of 8 steps (100%) done
Complete log: .snakemake/log/2024-02-20T153451.526914.snakemake.log
Observe the output
As you can see, the fastqc steps don’t generate as much text as before. Also, if you have a look at your working directory, you should see a
Logs
folder in there now, containing all the individual logs of your input files and rules:
john.doe@cluster-i2bc:/data/work/I2BC/john.doe/snakemake_tutorial$ ls Logs/
multiqc.err SRR3099586_chr18_fastqc.err SRR3105697_chr18_fastqc.err SRR3105699_chr18_fastqc.err
multiqc.std SRR3099586_chr18_fastqc.std SRR3105697_chr18_fastqc.std SRR3105699_chr18_fastqc.std
SRR3099585_chr18_fastqc.err SRR3099587_chr18_fastqc.err SRR3105698_chr18_fastqc.err
SRR3099585_chr18_fastqc.std SRR3099587_chr18_fastqc.std SRR3105698_chr18_fastqc.std