Getting started with Snakemake
Objective 2:
Learn how to dispatch each individual job onto separate processors of the cluster using the
executor
option.Where to start?
--executor cluster-generic
is used to tell Snakemake that we would like to use a “cluster-generic” executor rather than the local shell--cluster-generic-*
options are used to specify how Snakemake should use the given executor--jobs
will enable you to specify the maximum number of jobs that are allowed to run at the same time
Run your Snakefile
Running Snakemake is quite straightforward (we’ll add
-R fastqc
to force Snakemake to re-run everything so you can observe the changes):
snakemake -s ex1b_o3.smk --executor "cluster-generic" --cluster-generic-submit-cmd "qsub -V -l ncpus=1 -l mem=100Mb" --jobs 6 --configfile ex1.yml -p -R fastqc
Your output should look like this:
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided remote nodes: 6
Job stats:
job count
------- -------
all 1
fastqc 6
multiqc 1
total 8
Select jobs to execute...
Execute 6 jobs...
[Wed Feb 21 21:33:55 2024]
rule fastqc:
input: Data/SRR3105697_chr18.fastq.gz
output: FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.html
log: Logs/SRR3105697_chr18_fastqc.std, Logs/SRR3105697_chr18_fastqc.err
jobid: 5
reason: Forced execution
wildcards: sample=SRR3105697_chr18
resources: tmpdir=
fastqc --outdir FastQC Data/SRR3105697_chr18.fastq.gz 1>Logs/SRR3105697_chr18_fastqc.std 2>Logs/SRR3105697_chr18_fastqc.err
Submitted job 5 with external jobid '748616.pbsserver'.
[Wed Feb 21 21:33:55 2024]
rule fastqc:
input: Data/SRR3105699_chr18.fastq.gz
output: FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR3105699_chr18_fastqc.html
log: Logs/SRR3105699_chr18_fastqc.std, Logs/SRR3105699_chr18_fastqc.err
jobid: 6
reason: Forced execution
wildcards: sample=SRR3105699_chr18
resources: tmpdir=
fastqc --outdir FastQC Data/SRR3105699_chr18.fastq.gz 1>Logs/SRR3105699_chr18_fastqc.std 2>Logs/SRR3105699_chr18_fastqc.err
Submitted job 6 with external jobid '748617.pbsserver'.
[Wed Feb 21 21:33:55 2024]
rule fastqc:
input: Data/SRR3105698_chr18.fastq.gz
output: FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.html
log: Logs/SRR3105698_chr18_fastqc.std, Logs/SRR3105698_chr18_fastqc.err
jobid: 2
reason: Forced execution
wildcards: sample=SRR3105698_chr18
resources: tmpdir=
fastqc --outdir FastQC Data/SRR3105698_chr18.fastq.gz 1>Logs/SRR3105698_chr18_fastqc.std 2>Logs/SRR3105698_chr18_fastqc.err
Submitted job 2 with external jobid '748618.pbsserver'.
[Wed Feb 21 21:33:55 2024]
rule fastqc:
input: Data/SRR3099586_chr18.fastq.gz
output: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.html
log: Logs/SRR3099586_chr18_fastqc.std, Logs/SRR3099586_chr18_fastqc.err
jobid: 1
reason: Forced execution
wildcards: sample=SRR3099586_chr18
resources: tmpdir=
fastqc --outdir FastQC Data/SRR3099586_chr18.fastq.gz 1>Logs/SRR3099586_chr18_fastqc.std 2>Logs/SRR3099586_chr18_fastqc.err
Submitted job 1 with external jobid '748619.pbsserver'.
[Wed Feb 21 21:33:55 2024]
rule fastqc:
input: Data/SRR3099587_chr18.fastq.gz
output: FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.html
log: Logs/SRR3099587_chr18_fastqc.std, Logs/SRR3099587_chr18_fastqc.err
jobid: 3
reason: Forced execution
wildcards: sample=SRR3099587_chr18
resources: tmpdir=
fastqc --outdir FastQC Data/SRR3099587_chr18.fastq.gz 1>Logs/SRR3099587_chr18_fastqc.std 2>Logs/SRR3099587_chr18_fastqc.err
Submitted job 3 with external jobid '748620.pbsserver'.
[Wed Feb 21 21:33:55 2024]
rule fastqc:
input: Data/SRR3099585_chr18.fastq.gz
output: FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.html
log: Logs/SRR3099585_chr18_fastqc.std, Logs/SRR3099585_chr18_fastqc.err
jobid: 4
reason: Forced execution
wildcards: sample=SRR3099585_chr18
resources: tmpdir=
fastqc --outdir FastQC Data/SRR3099585_chr18.fastq.gz 1>Logs/SRR3099585_chr18_fastqc.std 2>Logs/SRR3099585_chr18_fastqc.err
Submitted job 4 with external jobid '748621.pbsserver'.
[Wed Feb 21 21:34:08 2024]
Error in rule fastqc:
message: For further error details see the cluster/cloud log and the log files of the involved rule(s).
jobid: 5
input: Data/SRR3105697_chr18.fastq.gz
output: FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.html
log: Logs/SRR3105697_chr18_fastqc.std, Logs/SRR3105697_chr18_fastqc.err (check log file(s) for error details)
shell:
fastqc --outdir FastQC Data/SRR3105697_chr18.fastq.gz 1>Logs/SRR3105697_chr18_fastqc.std 2>Logs/SRR3105697_chr18_fastqc.err
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
external_jobid: 748616.pbsserver
[Wed Feb 21 21:34:08 2024]
Error in rule fastqc:
message: For further error details see the cluster/cloud log and the log files of the involved rule(s).
jobid: 6
input: Data/SRR3105699_chr18.fastq.gz
output: FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR3105699_chr18_fastqc.html
log: Logs/SRR3105699_chr18_fastqc.std, Logs/SRR3105699_chr18_fastqc.err (check log file(s) for error details)
shell:
GNU nano 5.4 Snakefile
fastqc --outdir FastQC Data/SRR3105699_chr18.fastq.gz 1>Logs/SRR3105699_chr18_fastqc.std 2>Logs/SRR3105699_chr18_fastqc.err
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
external_jobid: 748617.pbsserver
[Wed Feb 21 21:34:08 2024]
Error in rule fastqc:
message: For further error details see the cluster/cloud log and the log files of the involved rule(s).
jobid: 2
input: Data/SRR3105698_chr18.fastq.gz
output: FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.html
log: Logs/SRR3105698_chr18_fastqc.std, Logs/SRR3105698_chr18_fastqc.err (check log file(s) for error details)
shell:
fastqc --outdir FastQC Data/SRR3105698_chr18.fastq.gz 1>Logs/SRR3105698_chr18_fastqc.std 2>Logs/SRR3105698_chr18_fastqc.err
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
external_jobid: 748618.pbsserver
[Wed Feb 21 21:34:08 2024]
Error in rule fastqc:
message: For further error details see the cluster/cloud log and the log files of the involved rule(s).
jobid: 1
input: Data/SRR3099586_chr18.fastq.gz
output: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.html
log: Logs/SRR3099586_chr18_fastqc.std, Logs/SRR3099586_chr18_fastqc.err (check log file(s) for error details)
shell:
fastqc --outdir FastQC Data/SRR3099586_chr18.fastq.gz 1>Logs/SRR3099586_chr18_fastqc.std 2>Logs/SRR3099586_chr18_fastqc.err
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
external_jobid: 748619.pbsserver
[Wed Feb 21 21:34:08 2024]
Error in rule fastqc:
message: For further error details see the cluster/cloud log and the log files of the involved rule(s).
jobid: 3
input: Data/SRR3099587_chr18.fastq.gz
output: FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.html
log: Logs/SRR3099587_chr18_fastqc.std, Logs/SRR3099587_chr18_fastqc.err (check log file(s) for error details)
shell:
fastqc --outdir FastQC Data/SRR3099587_chr18.fastq.gz 1>Logs/SRR3099587_chr18_fastqc.std 2>Logs/SRR3099587_chr18_fastqc.err
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
external_jobid: 748620.pbsserver
[Wed Feb 21 21:34:08 2024]
Error in rule fastqc:
message: For further error details see the cluster/cloud log and the log files of the involved rule(s).
jobid: 4
input: Data/SRR3099585_chr18.fastq.gz
output: FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.html
log: Logs/SRR3099585_chr18_fastqc.std, Logs/SRR3099585_chr18_fastqc.err (check log file(s) for error details)
shell:
fastqc --outdir FastQC Data/SRR3099585_chr18.fastq.gz 1>Logs/SRR3099585_chr18_fastqc.std 2>Logs/SRR3099585_chr18_fastqc.err
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
external_jobid: 748621.pbsserver
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-02-21T213348.723440.snakemake.log
WorkflowError:
At least one job did not complete successfully.
Observe the output
Are you getting red error messages too? Don’t worry, that was expected 😉
Let’s first have a look at the output log. There are several indicators that the jobs were submitted correctly to OpenPBS: “
Provided remote nodes: 6
” and “Submitted job 5 with external jobid '748616.pbsserver'.
” for example. We can see that each job was submitted individually to OpenPBS (they all have individual “external jobids”). We can also see all OpenPBS’s output/error files generated in our current working directory:
john.doe@node06:/data/work/I2BC/john.doe/snakemake_tutorial$ ls
Data ex1.yml snakejob.fastqc.1.sh.e748619 snakejob.fastqc.2.sh.o748618 snakejob.fastqc.4.sh.e748621 snakejob.fastqc.5.sh.o748616
Logs multiqc_data snakejob.fastqc.1.sh.o748619 snakejob.fastqc.3.sh.e748620 snakejob.fastqc.4.sh.o748621 snakejob.fastqc.6.sh.e748617
ex1.smk multiqc_report.html snakejob.fastqc.2.sh.e748618 snakejob.fastqc.3.sh.o748620 snakejob.fastqc.5.sh.e748616 snakejob.fastqc.6.sh.o748617
So what went wrong? The log tells us that the fastqc jobs didn’t end properly. Their logs are redirected into files stored in the
Logs/
folder, let’s have a look at one of them:
john.doe@node06:/data/work/I2BC/john.doe/snakemake_tutorial$ more Logs/SRR3099586_chr18_fastqc.err
/usr/bin/bash: line 1: fastqc: command not found
Aha! Do you remember the
module load
s we did at the beginning of this exercise? Here, each job is executed on a new processor on which we haven’t loaded the necessary software for fastqc to run. We’ll see how to fix this in the next objective.