Exercise 0 - Reinforce what was seen in the introduction

Authors: BIOI2 & I2BC members

Last updated: 2025-03-13

Objectives

In this Exercise, we will be taking our first steps with the cluster. We will:

⁕ ⁕ ⁕ ⁕ ⁕ ⁕

Setup

Task 1: Connect to the cluster

Connecting to the cluster through the command line is a two-step process. You should first connect to a “bridge server” called passerelle, then to the actual cluster from the passerelle using the ssh (=“Secure SHell”) command.

The addresses are:

Click to see hints
  1. you need a terminal

NB: you can also use third party software such as MobaXterm, X2Go or TurboVNC…

  1. the ssh command is used as follows
ssh login@passerelle.i2bc.paris-saclay.fr

Login & password are your MultiPass credentials (usually firstname.lastname). NB: Don’t be surprised if the characters of your password don’t appear on the screen while you’re typing them, that’s normal. Just press Enter to validate.

Click to see answers
  1. Connect to the passerelle

The passerelle is a “bridge server” that you can use in order to connect to most other servers of the I2BC, including the cluster. Use the ssh command to connect to it:

ssh john.doe@passerelle.i2bc.paris-saclay.fr

Login & password: your MultiPass credentials (usually firstname.lastname). Don’t be surprised if the characters of your password don’t appear on the screen while you’re typing them, that’s normal.

  1. Connect to the cluster

Use the ssh command to connect to the cluster from the passerelle (you can also bypass the passerelle if you’re directly connected to the I2BC network):

ssh slurmlogin.calcul.i2bc.paris-saclay.fr

Password: same as before.

You’re now on the Master node of the cluster. It’s called “slurmlogin” at the I2BC and it uses Debian with a bash shell.

Tip: Where am I?

Each line in the terminal starts with a prompt, for example:

john.doe@slurmlogin:/home/john.doe$

⁕ ⁕ ⁕ ⁕ ⁕ ⁕

Computational resources

Question 2: What are the 4 main components of the I2BC computer cluster?

Click to see answer.

If we simplify, the I2BC cluster is made of the following:

Question 3: Why should I avoid running programmes on the master node?

Click to see answer.

/!\ the master node is not made for heavy computation. You should always run your programmes on one of the slave nodes (=the workers)!

That’s because it’s used as a login node and has limited resources. Thus, running programmes or scripts directly on it will slow down the cluster for everyone.

Question 4: Are all slave nodes the same?

Click to see answer.

Yes and no :-)

Have a look at the partitions tab on the intranet for more information on the different partitions that exist on the I2BC cluster.

Task 5: list all the resources on the I2BC cluster with sinfo

Running sinfo -s will list a more summarised version of all resources.

Click to see the output of the above command.
PARTITION  AVAIL  TIMELIMIT   NODES(A/I/O/T) NODELIST
common*       up   infinite        2/17/1/20 node[06-07,13-16,21-24,27-28,30-37]
alphafold     up   infinite          0/1/3/4 node[41,49-51]
smallgpu      up   infinite          0/1/0/1 node38
run2          up   infinite        2/17/1/20 node[06-07,13-16,21-24,27-28,30-37]
run4          up   infinite        2/17/1/20 node[06-07,13-16,21-24,27-28,30-37]
run8          up   infinite        2/17/1/20 node[06-07,13-16,21-24,27-28,30-37]
run16         up   infinite        2/17/1/20 node[06-07,13-16,21-24,27-28,30-37]
run32         up   infinite        2/17/1/20 node[06-07,13-16,21-24,27-28,30-37]
lowprio       up   infinite        2/19/1/22 node[06-07,13-16,21-24,27-28,30-38,40]
lowpriogpu    up   infinite          0/2/0/2 node[38,49]

sinfo -s gives you a list of partitions and the nodes that are part of each partition. As you can see, some nodes belong to several partitions.

The NODES column indicates the number of nodes that are available/idle/other/total within each partition, available+idle being the total number of nodes that are functional (idle meaning the node is completely free = no jobs running on it at the moment).

⁕ ⁕ ⁕ ⁕ ⁕ ⁕

Storage & computing spaces

When you start using the cluster, it’s important to distinguish between storage and computing spaces and in what context you should use one or the other.

On the cluster, you have access to your usual “partages” spaces (=/store) from any of the nodes. And you also have access to more cluster-specific spaces such as /data/work/I2BC, your /home and the /scratchlocal specific to each node.

Different spaces should be used for different things:

Path Function
/store/EQUIPES - raw data & final processed data (for safe keeping)
- all data & protocoles that should be accessible by the team and that cannot be easily re-generated
/store/USERS - personal but professional data (e.g. course material for PhD students or teachers, work contracts etc.)
/home - login & config files (e.g. .bashrc, .condarc etc.)
- installed programmes (i.e. in .local, [micro]mamba/[mini]conda etc.)
/data/work/I2BC - temporary data that needs to be accessible from all nodes (e.g. a copy of a small database fetched from the internet, non-dividable working files, etc.)
- data that is copied from somewhere else (e.g. for sharing with other teams)
/scatchlocal - temporary data that doesn’t have to be accessible from all nodes (e.g. intermediate files generated by a tool)

Question 6: I generated data for my team’s project, where should I store it?

Click to see answer.

Anything linked to your team’s project is best stored within the team space on “partages” i.e. in /store/EQUIPES/your_team/.

Why? so that everyone of your team has access to it, even after you leave.

Of note, every team member also has a dedicated space within /store/EQUIPES/your_team/MEMBERS/ which is readable by everyone but writable only by him/her and the team’s PI.

Question 7: My script generates plenty of temporary intermediate files, where should they go?

Click to see answer.

You have 2 options:

Both spaces aren’t cleaned automatically, so it’s up to you to do it!

If you have a lot of read & write operations to do, it might be more interesting to use the /scratchlocal/ space which is local to each node (avoids saturating the network).

Question 8: Why shouldn’t I store temporary files within the “partages” spaces (/store)?

Click to see answer.

Thus the data you put on “Partages” spaces is stored for up to 3 months (even after deleting it). Saving temporary files to it will take up a lot of space for nothing.

NB: backups are not specific to the cluster, you can access them:

/!\ just remember, for temporary spaces, delete your data when you’ve finished because it’s not done automatically and space is limited!

See the intranet section on storage and the “Espaces” tab within the “Cluster” section for more information on “partages” spaces.

⁕ ⁕ ⁕ ⁕ ⁕ ⁕

Software

Like many clusters, you can use the module command to list & load the software you need. The key combinations are:

You have to load a software in order to use it.

Question 9: Is the software FastQC installed on the cluster?

Click to see answer.

Yes. You can whether list all modules with module avail or you can run a more specific search using the -C option:

john.doe@slurmlogin:/home/john.doe$ module avail -C fast -i
------------------------------------------- /usr/share/modules/modulefiles --------------------------------------------
bcl2fast2/bcl2fastq2-2.15.0.4  fastp/fastp-0.23.2  fastqc/fastqc_v0.10.1  nodes/fastdnaml-1.2.2
bcl2fast2/bcl2fastq2-2.18.12   fastp/fastp-0.23.4  fastqc/fastqc_v0.11.5  singularity/fastqc

-C is an option to filter outputs to only those containing the given search term -i is an option to make the search case insensitive

Task 10: Load FastQC and run fastqc --version to see if it worked

/!\ Warning: Most softwares are not accessible on the master node as it’s not made for running scripts and programmes, so make sure you connect to a slave node first!

Click to see answer.

module load fastqc/fastqc_v0.11.5 won’t create an error if you run it from the master node. However, if you try running fastqc, it won’t work. Thus, we first have to connect to a salve node.

  1. Create an interactive job:
john.doe@slurmlogin:/home/john.doe$ srun --pty bash
john.doe@node06:/home/john.doe$
  1. Load the module:
john.doe@node06:/home/john.doe$ module load fastqc/fastqc_v0.11.5
john.doe@node06:/home/john.doe$ fastqc --version
FastQC v0.11.5
  1. Exit the interactive job:
john.doe@node06:/home/john.doe$ exit 0
john.doe@slurmlogin:/home/john.doe$ srun --pty bash

If your software isn’t on the cluster, contact the SICS, they’ll install it rapidly for you

⁕ ⁕ ⁕ ⁕ ⁕ ⁕

Interactive session

Task 11: Connect, then disconnect from a node of the cluster.

If you want to use some resources, you have to ask the Scheduler using a specific syntax: the Slurm language (this might be different on different clusters but Slurm is quite widely spread).

Hints:

Output of the above commands
john.doe@slurmlogin:~$ srun --pty bash
john.doe@node06:~$ exit 0
exit
john.doe@slurmlogin:~$

As you can see, the terminal prompt changed: John switched from slurmlogin (=master node) to node06 (=slave node) and then back to slurmlogin.

In summary:

⁕ ⁕ ⁕ ⁕ ⁕ ⁕

Take home message