Getting started with Snakemake

Exercise 1C - optimising resource usage

objective > setup > o1 > o2 > o3 > o4 > o5 > recap

Objective 2:

Learn how to dispatch each individual job onto separate processors of the cluster using the executor option.
Where to start?
  • --executor cluster-generic is used to tell Snakemake that we would like to use a “cluster-generic” executor rather than the local shell
  • --cluster-generic-* options are used to specify how Snakemake should use the given executor
  • --jobs will enable you to specify the maximum number of jobs that are allowed to run at the same time
Run your Snakefile
Running Snakemake is quite straightforward (we’ll add -R fastqc to force Snakemake to re-run everything so you can observe the changes):
				
					snakemake -s ex1b_o3.smk --executor "cluster-generic" --cluster-generic-submit-cmd "qsub -V -l ncpus=1 -l mem=100Mb" --jobs 6 --configfile ex1.yml -p -R fastqc 
				
			
Your output should look like this:
				
					Building DAG of jobs...
Using shell: /usr/bin/bash
Provided remote nodes: 6
Job stats:
job        count
-------  -------
all            1
fastqc         6
multiqc        1
total          8

Select jobs to execute...
Execute 6 jobs...

[Wed Feb 21 21:33:55 2024]
rule fastqc:
    input: Data/SRR3105697_chr18.fastq.gz
    output: FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.html
    log: Logs/SRR3105697_chr18_fastqc.std, Logs/SRR3105697_chr18_fastqc.err
    jobid: 5
    reason: Forced execution
    wildcards: sample=SRR3105697_chr18
    resources: tmpdir=<TBD>

fastqc --outdir FastQC Data/SRR3105697_chr18.fastq.gz 1>Logs/SRR3105697_chr18_fastqc.std 2>Logs/SRR3105697_chr18_fastqc.err
Submitted job 5 with external jobid '748616.pbsserver'.

[Wed Feb 21 21:33:55 2024]
rule fastqc:
    input: Data/SRR3105699_chr18.fastq.gz
    output: FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR3105699_chr18_fastqc.html
    log: Logs/SRR3105699_chr18_fastqc.std, Logs/SRR3105699_chr18_fastqc.err
    jobid: 6
    reason: Forced execution
    wildcards: sample=SRR3105699_chr18
    resources: tmpdir=<TBD>

fastqc --outdir FastQC Data/SRR3105699_chr18.fastq.gz 1>Logs/SRR3105699_chr18_fastqc.std 2>Logs/SRR3105699_chr18_fastqc.err
Submitted job 6 with external jobid '748617.pbsserver'.

[Wed Feb 21 21:33:55 2024]
rule fastqc:
    input: Data/SRR3105698_chr18.fastq.gz
    output: FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.html
    log: Logs/SRR3105698_chr18_fastqc.std, Logs/SRR3105698_chr18_fastqc.err
    jobid: 2
    reason: Forced execution
    wildcards: sample=SRR3105698_chr18
    resources: tmpdir=<TBD>

fastqc --outdir FastQC Data/SRR3105698_chr18.fastq.gz 1>Logs/SRR3105698_chr18_fastqc.std 2>Logs/SRR3105698_chr18_fastqc.err
Submitted job 2 with external jobid '748618.pbsserver'.

[Wed Feb 21 21:33:55 2024]
rule fastqc:
    input: Data/SRR3099586_chr18.fastq.gz
    output: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.html
    log: Logs/SRR3099586_chr18_fastqc.std, Logs/SRR3099586_chr18_fastqc.err
    jobid: 1
    reason: Forced execution
    wildcards: sample=SRR3099586_chr18
    resources: tmpdir=<TBD>

fastqc --outdir FastQC Data/SRR3099586_chr18.fastq.gz 1>Logs/SRR3099586_chr18_fastqc.std 2>Logs/SRR3099586_chr18_fastqc.err
Submitted job 1 with external jobid '748619.pbsserver'.

[Wed Feb 21 21:33:55 2024]
rule fastqc:
    input: Data/SRR3099587_chr18.fastq.gz
    output: FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.html
    log: Logs/SRR3099587_chr18_fastqc.std, Logs/SRR3099587_chr18_fastqc.err
    jobid: 3
    reason: Forced execution
    wildcards: sample=SRR3099587_chr18
    resources: tmpdir=<TBD>

fastqc --outdir FastQC Data/SRR3099587_chr18.fastq.gz 1>Logs/SRR3099587_chr18_fastqc.std 2>Logs/SRR3099587_chr18_fastqc.err
Submitted job 3 with external jobid '748620.pbsserver'.

[Wed Feb 21 21:33:55 2024]
rule fastqc:
    input: Data/SRR3099585_chr18.fastq.gz
    output: FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.html
    log: Logs/SRR3099585_chr18_fastqc.std, Logs/SRR3099585_chr18_fastqc.err
    jobid: 4
    reason: Forced execution
    wildcards: sample=SRR3099585_chr18
    resources: tmpdir=<TBD>

fastqc --outdir FastQC Data/SRR3099585_chr18.fastq.gz 1>Logs/SRR3099585_chr18_fastqc.std 2>Logs/SRR3099585_chr18_fastqc.err
Submitted job 4 with external jobid '748621.pbsserver'.

[Wed Feb 21 21:34:08 2024]
Error in rule fastqc:
    message: For further error details see the cluster/cloud log and the log files of the involved rule(s).
    jobid: 5
    input: Data/SRR3105697_chr18.fastq.gz
    output: FastQC/SRR3105697_chr18_fastqc.zip, FastQC/SRR3105697_chr18_fastqc.html
    log: Logs/SRR3105697_chr18_fastqc.std, Logs/SRR3105697_chr18_fastqc.err (check log file(s) for error details)
    shell:
        fastqc --outdir FastQC Data/SRR3105697_chr18.fastq.gz 1>Logs/SRR3105697_chr18_fastqc.std 2>Logs/SRR3105697_chr18_fastqc.err
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    external_jobid: 748616.pbsserver

[Wed Feb 21 21:34:08 2024]
Error in rule fastqc:
    message: For further error details see the cluster/cloud log and the log files of the involved rule(s).
    jobid: 6
    input: Data/SRR3105699_chr18.fastq.gz
    output: FastQC/SRR3105699_chr18_fastqc.zip, FastQC/SRR3105699_chr18_fastqc.html
    log: Logs/SRR3105699_chr18_fastqc.std, Logs/SRR3105699_chr18_fastqc.err (check log file(s) for error details)
    shell:
  GNU nano 5.4                                                                                      Snakefile
        fastqc --outdir FastQC Data/SRR3105699_chr18.fastq.gz 1>Logs/SRR3105699_chr18_fastqc.std 2>Logs/SRR3105699_chr18_fastqc.err
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    external_jobid: 748617.pbsserver

[Wed Feb 21 21:34:08 2024]
Error in rule fastqc:
    message: For further error details see the cluster/cloud log and the log files of the involved rule(s).
    jobid: 2
    input: Data/SRR3105698_chr18.fastq.gz
    output: FastQC/SRR3105698_chr18_fastqc.zip, FastQC/SRR3105698_chr18_fastqc.html
    log: Logs/SRR3105698_chr18_fastqc.std, Logs/SRR3105698_chr18_fastqc.err (check log file(s) for error details)
    shell:
        fastqc --outdir FastQC Data/SRR3105698_chr18.fastq.gz 1>Logs/SRR3105698_chr18_fastqc.std 2>Logs/SRR3105698_chr18_fastqc.err
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    external_jobid: 748618.pbsserver

[Wed Feb 21 21:34:08 2024]
Error in rule fastqc:
    message: For further error details see the cluster/cloud log and the log files of the involved rule(s).
    jobid: 1
    input: Data/SRR3099586_chr18.fastq.gz
    output: FastQC/SRR3099586_chr18_fastqc.zip, FastQC/SRR3099586_chr18_fastqc.html
    log: Logs/SRR3099586_chr18_fastqc.std, Logs/SRR3099586_chr18_fastqc.err (check log file(s) for error details)
    shell:
        fastqc --outdir FastQC Data/SRR3099586_chr18.fastq.gz 1>Logs/SRR3099586_chr18_fastqc.std 2>Logs/SRR3099586_chr18_fastqc.err
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    external_jobid: 748619.pbsserver

[Wed Feb 21 21:34:08 2024]
Error in rule fastqc:
    message: For further error details see the cluster/cloud log and the log files of the involved rule(s).
    jobid: 3
    input: Data/SRR3099587_chr18.fastq.gz
    output: FastQC/SRR3099587_chr18_fastqc.zip, FastQC/SRR3099587_chr18_fastqc.html
    log: Logs/SRR3099587_chr18_fastqc.std, Logs/SRR3099587_chr18_fastqc.err (check log file(s) for error details)
    shell:
        fastqc --outdir FastQC Data/SRR3099587_chr18.fastq.gz 1>Logs/SRR3099587_chr18_fastqc.std 2>Logs/SRR3099587_chr18_fastqc.err
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    external_jobid: 748620.pbsserver

[Wed Feb 21 21:34:08 2024]
Error in rule fastqc:
    message: For further error details see the cluster/cloud log and the log files of the involved rule(s).
    jobid: 4
    input: Data/SRR3099585_chr18.fastq.gz
    output: FastQC/SRR3099585_chr18_fastqc.zip, FastQC/SRR3099585_chr18_fastqc.html
    log: Logs/SRR3099585_chr18_fastqc.std, Logs/SRR3099585_chr18_fastqc.err (check log file(s) for error details)
    shell:
        fastqc --outdir FastQC Data/SRR3099585_chr18.fastq.gz 1>Logs/SRR3099585_chr18_fastqc.std 2>Logs/SRR3099585_chr18_fastqc.err
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    external_jobid: 748621.pbsserver

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-02-21T213348.723440.snakemake.log
WorkflowError:
At least one job did not complete successfully.
				
			
Observe the output
Are you getting red error messages too? Don’t worry, that was expected 😉
Let’s first have a look at the output log. There are several indicators that the jobs were submitted correctly to OpenPBS: “Provided remote nodes: 6” and “Submitted job 5 with external jobid '748616.pbsserver'.” for example. We can see that each job was submitted individually to OpenPBS (they all have individual “external jobids”). We can also see all OpenPBS’s output/error files generated in our current working directory:
				
					john.doe@node06:/data/work/I2BC/john.doe/snakemake_tutorial$ ls
Data     ex1.yml              snakejob.fastqc.1.sh.e748619  snakejob.fastqc.2.sh.o748618  snakejob.fastqc.4.sh.e748621  snakejob.fastqc.5.sh.o748616
Logs     multiqc_data         snakejob.fastqc.1.sh.o748619  snakejob.fastqc.3.sh.e748620  snakejob.fastqc.4.sh.o748621  snakejob.fastqc.6.sh.e748617
ex1.smk  multiqc_report.html  snakejob.fastqc.2.sh.e748618  snakejob.fastqc.3.sh.o748620  snakejob.fastqc.5.sh.e748616  snakejob.fastqc.6.sh.o748617
				
			
So what went wrong? The log tells us that the fastqc jobs didn’t end properly. Their logs are redirected into files stored in the Logs/ folder, let’s have a look at one of them:
				
					john.doe@node06:/data/work/I2BC/john.doe/snakemake_tutorial$ more Logs/SRR3099586_chr18_fastqc.err 
/usr/bin/bash: line 1: fastqc: command not found
				
			
Aha! Do you remember the module loads we did at the beginning of this exercise? Here, each job is executed on a new processor on which we haven’t loaded the necessary software for fastqc to run. We’ll see how to fix this in the next objective.
Scroll to Top