Last updated: 2026-03-11
In this Exercise, we will be taking our first steps with the cluster. We will:
â â â â â â
Itâs the same as for Exercise
0: you should be connected to the cluster and on the master node
(i.e. slurmlogin should be written in your terminal
prefix).
â â â â â â
Using an interactive session (srun --pty bash) is useful
when youâre setting up a new analysis pipeline and getting to know how
to use the software within it. In the long run, itâs better to use a
submission script instead so that you free the resources as soon as your
software has finished running. Also, this means you can switch off the
connection to the cluster without affecting your job (think of it like
posting a letter - switching your computer off wonât affect the job that
is running).
In this exercise, we will need to create & edit files. We suggest
you use the in-line text editor nano but feel free to use
whatever editor you prefer. If youâre working on the
/stockage/ space, remember that you also have access to it
outside the cluster (see this
and this
page on the intranet).
The Slurm submission script is just a script written in bash (thatâs
the language of the cluster, i.e. cd and ls
are bash commands). You can put in a bash script whatever you would
normally write in the terminal with 1 command per line.
Letâs create a script called slurm_script.sh that will
print âhello worldâ on the screen:
#! /bin/bash
echo "hello world"Next, we will submit the script to the scheduler, which will look for a free node to run it on.
To submit your script, you can use the sbatch
command:
john.doe@slurmlogin:/home/john.doe$ sbatch slurm_script.sh
Submitted batch job 123456
john.doe@slurmlogin:/home/john.doe$Note:
According to resource availability, your job might need to queue or
start running directly. To see its status, you can use slurmâs in-build
squeue command.
NB: squeue will only show currently running jobs so you
might not see your job if youâre not quick enough!
john.doe@slurmlogin:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
123456 common slurm_sc john.doe R 0:04 1 node24
In order to see past jobs, you can use the sacct command
with a few options:
sacct -Xjohn.doe@slurmlogin:~$ sacct -X
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
115896 bash ngs 1 CANCELLED+ 0:0
115960 snakemake+ ngs 8 FAILED 1:0
123456 bash common 1 COMPLETED 0:0The âFAILEDâ state is a bit missleading sometimes. Itâs not because itâs marked as âFAILEDâ that your job didnât run properly. Check if the expected output was generated before panicking ;-)
sacct -X will list all your jobs that are or have been
running from midnight onwards. A few useful options are:
-X: simplifies the output (very useful unless you are
using steps within your jobs)S 2025-03-10: defines the start date/time (e.g. 10th of
March 2025 = show all jobs since then), default is midnight of the
current day-a: not used here, but you can add it to show jobs of
all users and not only yours-j <jobid>: only show the job with the given job
idThe SICS installed a set of scripts from slurm-tools that you can use to do the same as above. You donât have to remember everything, itâs up to you to choose your favorite ones:
jobqueue: get list of jobs in queue or runningjobhist: get list of all jobs that have run or are
running or queuingExample outputs:
john.doe@slurmlogin:~$ jobqueue
-----------------------------------------------------------------------------------------
Job ID User Job Name Partition State Elapsed Nodelist(Reason)
------------ ---------- ------------ ---------- -------- ----------- --------------------
123456 john.doe bash common RUNNING 0:02 node24john.doe@slurmlogin:~$ jobhist
----------------------------------------------------------------------------------------------------
Job ID Startdate User Job Name Partition State Elapsed Nodes CPUs Memory
------------- ---------- ---------- ------------ ---------- ---------- ----------- ----- ---- ------
115896 2025-03-05 jane.doe bash ngs CANCELLED+ 2-21:36:56 1 1 1000Mc
115960 2025-03-08 jane.doe snakemake_p+ ngs FAILED 00:00:02 1 8 64Gn
123456 2025-03-09 john.doe bash common RUNNING 00:00:42 1 1 1000McNB: Gn = Gb/node and Gc = Gb/cpu, same for M (Mb), K (Kb) and T (Tb)
The command echo "hello world" should normally print
âhello worldâ on your screen⌠When running scripts remotely on the
nodes, anything that is usually printed on the screen is saved in a file
instead.
Have a look at your working directory, you should see an extra file
in there. If you open it (with the cat command for
example), you should see âHello worldâ in there.
john.doe@slurmlogin:/home/john.doe$ ls
slurm_script.sh slurm-123456.out
john.doe@slurmlogin:/home/john.doe$ cat slurm-123456.out
Hello worldTake home message
srun --pty bash)sbatch commandsqueue to list all jobs that are queuing/running, or
sacct to also list past jobsThe cluster is a shared resource so itâs important to make sure that your queries are submitted with a reasonable amount of asked resources. Default parameters are 2Gb of RAM memory, 1 CPU and a maximum running time of 2 hrs.
To know how much of the reserved resources your job actually used,
you can use a combination of different Slurm commands. However, usage
and outputs are not always very clear for beginner users. We suggest you
use the jobinfo <jobid> command from slurm-tools (already
installed on the I2BC cluster).
jobinfo 123456 will output:
Job ID : 123456
Job name : bash
User : john.doe
Account :
Working directory : /data/work/I2BC/john.doe/testrun
Cluster : cluster
Partition : common
Nodes : 1
Nodelist : node24
Tasks : 1
CPUs : 1
GPUs : 0
State : COMPLETED
Exit code : 0:0
Submit time : 2025-03-09T09:07:56
Start time : 2025-03-09T09:07:56
End time : 2025-03-09T09:08:38
Wait time : 00:00:00
Reserved walltime : 00:00:00
Used walltime : 00:00:42 # Actual run time of job
Used CPU walltime : 00:00:42 # Used walltime x number of CPUs
Used CPU time : 00:00:00 # Total time that CPUs were actually used for
CPU efficiency : 0.18% # Used CPU time / Used CPU walltime
% User (computation) : 50.65%
% System (I/O) : 49.35%
Reserved memory : 1000M/core
Max memory used : 9.29M (estimate) # Maximum memory used
Memory efficiency : 0.93% # Max memory used / Reserved memory
Max disk write : 256.00K
Max disk read : 512.00KHow to read this output? Job id, Job name, User, Partition, Nodes, Nodelist and CPUs are like before and quite transparent. Interesting is:
CPU efficiency = how efficiently you used the CPUs
youâve reserved, this should be as close to 100% as possibleMemory efficiency = how efficiently you used the memory
youâve reserved, this should be as close to 100% as possibleSo in our case, with 0.18% and 0.93%, weâre quite bad in resource efficiencyâŚ
To adjust the resources, you can add a few extra options when running
srun or sbatch. Useful options in this case
are:
| option | function |
|---|---|
--mem=xxM--mem=xxG |
reserve the specified amount of RAM memory in Mb or Gb |
--cpus-per-task=x |
reserve the specified amount of processors (CPUs) |
Youâll find more options in the âcheat sheetâ tab on the intranet
Letâs adjust the resources of our previous job
Since our previous job only used very little resources, there is no sens in reserving 2Gb, letâs reduce it to 100Mb. Weâve used 0.18% of the reserved CPU but 1 CPU is already the minimum so weâll keep it that way.
There are 2 methods to specify these parameters to
sbatch:
sbatch --mem=100M --cpus-per-task=1 slurm_script.sh#! /bin/bash
#SBATCH --mem=100M
#SBATCH --cpus-per-task=1
echo "hello world"then resubmit with sbatch slurm_script.sh.
Note that the syntax is the same in both cases, with additionally the
#SBATCH prefix in the script and each directive should be
on a separate line.
NB: Itâs important to note that increasing the number of processors (CPUs or threads) wonât accelerate your job if the software youâre using doesnât support parallelisation.
Take home message
jobinfo is a customs script that gives you information
on what your job actually usedsbatch
command, whether directly at execution or within the Slurm submission
script (srun takes the same options as
sbatch)â â â â â â
What if I changed my mind? How can I stop the job I just submitted??!
scancelIn addition to sbatch/srun (submit a job)
and squeue/sacct (follow a job/check a jobâs
resources), the third Slurm command you should know is
scancel to cancel a job. You can only cancel your own jobs,
cancelling other peopleâs jobs wonât work.
Letâs see what jobs are running:
john.doe@slurmlogin:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
123456 common bash john.doe R 0:04 1 node24
123457 common test luke.doe R 0:09 1 node25Letâs now try to delete Lukeâs job:
john.doe@slurmlogin:~$ scancel 123457
scancel: Unauthorized Request 123457See? Nothing happened because I can only cancel my own jobs.
Letâs first add a line to your previous slurm_script.sh
to make sure it runs long enough for us to cancel it:
#! /bin/bash
#SBATCH --mem=100M
#SBATCH --cpus-per-task=1
echo "hello world"
sleep 2m # wait for 2 minutessbatch slurm_script.sh, and note its
job idsqueue (you can get the job id here
too)scancel 123456squeueYour job shouldâve changed status and then disappear after a little while, as expected.
john.doe@slurmlogin:~$ sbatch slurm_script.sh
Submitted batch job 123456
john.doe@slurmlogin:~$Job id is 123456 in this case.
squeuejohn.doe@slurmlogin:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
123456 common bash john.doe R 0:04 1 node24john.doe@slurmlogin:~$ scancel 123456squeuejohn.doe@slurmlogin:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
123456 common bash john.doe CG 0:09 1 node24CG = Completing
After a while, it should be gone.
Take home message
scancel to cancel a job using itâs unique job idâ â â â â â
sbatch & srun (submit),
squeue (follow), sacct (stats) and
scancel (cancel)man) pagessbatch & srun| option | function |
|---|---|
--mem=<mem> |
to specify the amout of memory to reserve per node |
--cpus-per-task=<cpus> |
to specify the number of CPUs to reserve per task (default nb tasks = 1) |
--job-name="<jobname>" |
to specify a job name (no spaces or special characters please) |
--time=[DD-]HH+:MM:SS |
to specify the maximum running time (default: 2hrs) |
--partition=common |
to specify the partition (=group of nodes) to submit your job to |
--output /path/to/output.log |
name of the file to save standard output to |
--error /path/to/error.log |
if specified, standard error is written to a separate file specified in the option |
More options in th âcheat sheetâ tab on the intranet and the official Slurm documentation