ba$h0

Introduction to Bash for beginners

ba$h0 training session

Introduction to Bash for beginners

2024-04-24

Teachers:

  • Emilie Drouineau, I2BC/CEA
  • Fadwa El-Khaddar, BIOI2 - I2BC/Univ. Paris-Saclay
  • Chloe Quignot, BIOI2 - I2BC/CEA
  • Claire Toffano-Nioche, I2BC/CNRS



Source: EBAII training material from IFB

Material under CC-BY-SA licence
CC-BY-SA

I2BC BIOI2 IFB

How do I chat with the computer?

Open a terminal and write things in it!

Can I speek with the computer in human language?

Use a language it understands, e.g. BASH (Bourne Again SHell) 1:

  • the BASH language is one of many extremely similar Shell dialects (bsh, ksh, csh, zsh, …)

  • BASH is based on a set of modular commands, which perform specific tasks

  • Commands are written just after the prompt. Here, the $ character symbolizes that the computer is waiting/ready for your commands.

__

(1) A pun on the first Shell language written by Stephen Bourne himself (bsh)

Prototype of a command

  • a command performs a task (sorting, selecting, opening, aligning reads, etc.)
  • generally speaking, a terminal instruction always begins with the name of a command
  • it has a certain number of arguments*, which may be optional, and which can modify its mode of operation
# below, the format [xxx] indicates that xxx is an optional element
command_name [argument_name [argument_value]] [file]
  • argument names are not standardized
  • arguments may or may not take values
  • arguments may have short and/or long forms
  • /!\ spaces between the command and its arguments are seen as separators



*there’s no standard term for ‘arguments’, you may also come accross the terms ‘flags’ or ‘options’

Example using the cal command (part 1)

cal (short for calendar) is a handy command to view a certain date, month or year:

⮕ without arguments

c.toffano-nioche@SSFA-18:$ cal
     Avril 2024       
di lu ma me je ve sa  
    1  2  3  4  5  6  
 7  8  9 10 11 12 13  
14 15 16 17 18 19 20  
21 22 23 24 25 26 27  
28 29 30  

⮕ with a single argument with no value

cal -3 displays 3 months, centered on the current month:

c.toffano-nioche@SSFA-18:$ cal -3
     Mars 2024             Avril 2024             Mai 2024        
di lu ma me je ve sa  di lu ma me je ve sa  di lu ma me je ve sa  
                1  2      1  2  3  4  5  6            1  2  3  4  
 3  4  5  6  7  8  9   7  8  9 10 11 12 13   5  6  7  8  9 10 11  
10 11 12 13 14 15 16  14 15 16 17 18 19 20  12 13 14 15 16 17 18  
17 18 19 20 21 22 23  21 22 23 24 25 26 27  19 20 21 22 23 24 25  
24 25 26 27 28 29 30  28 29 30              26 27 28 29 30 31     
31      

Example using the cal command (part 2)

⮕ with a single argument which requires a value

c.toffano-nioche@SSFA-18:$ cal -m may
      Mai 2024        
di lu ma me je ve sa  
          1  2  3  4  
 5  6  7  8  9 10 11  
12 13 14 15 16 17 18  
19 20 21 22 23 24 25  
26 27 28 29 30 31     

⮕ short vs long forms of an argument

  • often, arguments have short & long forms (more explicit/readable but longer to type…)
  • long forms are generally preceded by two dashes

e.g. instead of cal -3, we can use cal --three

c.toffano-nioche@SSFA-18:$ cal --three
     Mars 2024             Avril 2024             Mai 2024        
di lu ma me je ve sa  di lu ma me je ve sa  di lu ma me je ve sa  
                1  2      1  2  3  4  5  6            1  2  3  4  
 3  4  5  6  7  8  9   7  8  9 10 11 12 13   5  6  7  8  9 10 11  
10 11 12 13 14 15 16  14 15 16 17 18 19 20  12 13 14 15 16 17 18  
17 18 19 20 21 22 23  21 22 23 24 25 26 27  19 20 21 22 23 24 25  
24 25 26 27 28 29 30  28 29 30              26 27 28 29 30 31     
31      

How to get help?

Call the police, call your colleagues, search the Internet… or use the man command (manual)

c.toffano-nioche@SSFA-18:~$ man cal
CAL(1)                                     User Commands                       CAL(1)

NAME
       cal - display a calendar

SYNOPSIS
       cal [options] [[[day] month] year]

DESCRIPTION
       cal  displays  a  simple calendar.  If no arguments are specified, the current
       month is displayed.

OPTIONS
       -1, --one
              Display single month output.  (This is the default.)

       -3, --three
              Display prev/current/next month output.

...

SYNOPSIS explains how to write the command line; optional elements are written between [..]
DESCRIPTION describes the result of the command
OPTIONS list the available arguments, with their short and long forms

Shortcuts for the man interface:

  • navigate with your keyboard arrows (↑ & ↓)
  • /color: to search for the term color
  • n: (next) to search for the next occurrence of the term searched for
  • p: (previous) to search for the previous occurrence of the term searched for
  • q: to quit help

__

*Custom programmes/commands or scripts often have the -h or --help arguments to print help messages on how to use them.

Focus on the ls command and its arguments

The ls command lists directory contents and can take a number of arguments.

Among the main arguments:

  • -l (long/lots) gives a lot of information about files
  • -a (--all) shows all files, including hidden ones
  • -t (time) sorts by modification date
  • -h (--human-readable) displays file sizes in readable units
  • -r (--reverse) reverses sort order

Notes:

  • names of hidden files & folders begin with a dot (e.g. .git)
  • . and .. directories are special (detailed later)
  • pay attention to the spaces between the command and its arguments. The ls-l command does not exist!

Focus on the ls command and its arguments (page 2)

⮕ Arguments can be combined: ls -l --all

c.toffano-nioche@SSFA-18:~/formation_bash0$ ls -l --all
total 56
drwxr-xr-x 4 c.toffano-nioche tous 4096 avril 18 11:18 .
drwxr-xr-x 4 c.toffano-nioche tous 4096 avril 17 18:51 ..
-rw-r--r-- 1 c.toffano-nioche tous  587 avril 16 16:06 bash0_chatTerm.md
-rw-r--r-- 1 c.toffano-nioche tous 1131 avril 17 19:02 bash0_FindingHelp.md
-rw-r--r-- 1 c.toffano-nioche tous  833 avril 18 11:18 bash0_zoomLS.md
drwxr-xr-x 8 c.toffano-nioche tous 4096 avril 18 11:08 .git
drwxr-xr-x 2 c.toffano-nioche tous 4096 avril 16 15:14 images

⮕ Arguments can be merged (in short format): ls -lahtr

for a complete (-a) and detailed view (-l), sizes in KB,MB,GB,TB… i.e. human readable (-h), sorted by date/time (-t) from oldest to most recent (-r):

c.toffano-nioche@SSFA-18:~/formation_bash0$ ls -lahtr
total 56K
drwxr-xr-x 2 c.toffano-nioche tous 4.0K avril 16 15:14 images
-rw-r--r-- 1 c.toffano-nioche tous  587 avril 16 16:06 bash0_chatTerm.md
drwxr-xr-x 4 c.toffano-nioche tous 4.0K avril 17 18:51 ..
-rw-r--r-- 1 c.toffano-nioche tous 1.2K avril 17 19:02 bash0_FindingHelp.md
drwxr-xr-x 8 c.toffano-nioche tous 4.0K avril 18 11:08 .git
drwxr-xr-x 4 c.toffano-nioche tous 4.0K avril 18 11:18 .
-rw-r--r-- 1 c.toffano-nioche tous  833 avril 18 11:18 bash0_zoomLS.md

Filesystems are like trees

General information

  • The filesystem can be compared to a tree where leaves are directories and files. We can go throught it by following the branches.
  • The tree is anchored by the root: the / directory
arborescence0.png nautilus0.png

General information (part 2)

When we go up in the tree (=down in the hierarchy) by following the branches, we can see that the / (root) contains multiple directories (e.g. shared)

arborescence2.png nautilus2.png

General information (part 3)

The shared directory contains bank

arborescence3.png nautilus3.png

General information (part 4)

The bank directory contains homo_sapiens

arborescence4.png nautilus4.png

General information (part 5)

Thus, the pathway to go in the homo_sapiens directory from the root is : /shared/bank/homo_sapiens

arborescence5.png nautilus5.png

1. Specify the path from the root directory (/) : absolute path

arborescence6.png nautilus6.png
# To change, all the directories names have to be separate by /
cd /shared/bank/homo_sapiens
pwd

2. Specify the path from the current directory: relative path

arborescence7.png nautilus7.png
  • The current directory is the directory where the user is working
  • The path is relative to the current directory (.)
  • To go in the parent directory : cd .. or grandparent directory : cd ../..
# To stay in the same directory
cd . # same as cd ./
# If the working directory is star-2.7.5a, you stay here
pwd
# To go up from 2 directories 
cd ../.. # same as cd ../../
# To check the path
pwd

Take home message

Relative and absolute paths give the same result. If the working directory is shared and we have to access homo_sapiens:

  • Absolute path (/!\ absolute paths always start with the root /)
cd /shared/bank/homo_sapiens
pwd
  • Relative path:
cd bank/homo_sapiens
pwd

Other useful command - tree:

emilie.drouineau@cluster-i2bc:~$ tree -d
.
├── GRCh38
   ├── fasta
   ├── gff3
├── hg19
   ├── bwa
   ├── fasta
   ├── gtf
   └── star-2.7.5a
├── hg38
   ├── fasta
   └── star-2.7.5a
└── latest_genome -> GRCh38
arborescence8.png nautilus8.png

Home sweet home

There’s not better place than home!

  • It’s the user’s directory & it stores all of the user’s documents
  • It’s symbolized by ~ (tilde)*
  • Most of the time it’s /home/userName (but it may vary according to the infrastructure you’re on e.g. on the IFB cluster, it’s: /shared/home/userName)
# absolute path
cd /home/userName
# short way for the same result
cd ~
# or
cd



__

*~ for Mac users: option + N or Alt + N

Autocompletion (<TAB><TAB>) - your new best friend!

If you want to shine in society or with your family by giving the impression of typing quickly, use auto-completion!

More seriously:

  • it’s essential for typing a path without making mistakes
  • it also saves time because you won’t need to type every single letter

E.g. try moving into the directory: /usr/local/bin using <TAB><TAB>

Where can I find the tab key?

Rest assured, you haven’t heard the last of

How to create, copy, remove files or directories

It’s important to organise files and directories to easily find data. 4 commands are useful:

  • mkdir (make directories) : to create a directory
# read the documentation
man mkdir
# create a directory named my_new_dir
mkdir my_new_dir
# check if the directory was created
ls
  • cp (copy) : to copy files and directories (/!\ to copy a directory, you need to add an option)
# read the documentation
man cp
# create a copy of a file
cp gameshell.sh copy_gameshell.sh
# create a copy of a directory
cp --recursive dir0/ dir1/

How to create, copy, remove files or directories

  • mv (move) : to move a file to an other directory or rename it
# read the documentation
man mv
# rename a file
mv my_file_with_a_long_useless_name_i_want_to_change.txt my_file.txt
# put my_file.txt in the directory called my_dir
mv my_file.txt my_dir/
  • rm (remove) : to remove files or directories. Warning: it’s easy to remove more files/directories than planned. To be sure, you can run the ls command before to check if it’s what you want.
# read the documentation
man rm
# remove a file
rm my_file_with_a_long_useless_name_i_want_to_change.txt

Know your rights!

Sometimes an error message may appear saying that you are “not authorised” to perform an action…

In Linux, your rights are dictated by three letters r, w and x:

  • r: read, right to read the file and open it
  • w: write, right to write and modify a file
  • x: execute, right to execute the file (a script, for example)

How do I know the rights of a file?

Remember ls -l? => ls to list the files of a folder, -l argument to get more information on the files

c.toffano-nioche@SSFA-18:~/formation_bash0$ ls -l
total 56
-rw-r--r-- 1 c.toffano-nioche tous  587 avril 16 16:06 bash0_chatTerm.md
-rw-r--r-- 1 c.toffano-nioche tous 1131 avril 17 19:02 bash0_FindingHelp.md
-rw-r--r-- 1 c.toffano-nioche tous  833 avril 18 11:18 bash0_zoomLS.md
drwxr-xr-x 2 c.toffano-nioche tous 4096 avril 16 15:14 images

When you type the ls -l command, you may notice that some lines start with a d (=directory) and other with a - (=file).

Note also the fact that rwx may or may not be repeated three times. The first triplet corresponds to the rights held by the owner of the file, the second corresponds to the rights allocated to users in the same group as the owner of the file and the last corresponds to the rights of all other users.

Accessing file contents

We’re often interested in the content of files: reading files, counting the number of lines, extracting a part (lines, columns), sorting lines, etc. Warning: some commands cannot access compressed files.

Counting the number of lines

  • wc (word count): count the number of lines, words and bytes of file(s)
# -l : count the number of lines in my_file.txt
wc -l my_file.txt

Read a file

  • With less or more you can read a file line by line (pager tool)
  • With head or tail you can visualize the first n lines or the last n lines of a file
  • With cat you print the full contents of the file
  • To navigate in the file when you are using less, it’s the same as the man command:
commands results
↑ or ↓ Go up or down in the file
> or < Go to the first or last line
/chr Then press enter to find the word chr
n or p find the next or previous word chr
q quit
# read a file 
# -N : display the line number
# -S : don't wrap lines even if they are longer than the screen (arrows to navigate)
less -S -N my_file.txt
# print the first five lines
head -n 5 my_file.txt
# print the last eleven lines
tail -n 11 my_file.txt

Select/remove a column in a file

If you have a tabular file format (e.g. .csv, .tsv, …) as below:

# file with chromosome, start, stop, name, score, strand
chr1    7517    7517    NM_023732__Abcb6    .   -
chr10   1826    1826    NM_019487__Hebp2    .   -
chrX    1494    1494    NONE    .   -
chrY    3470    3470    NA  .   +
chr11   3054    3054    NA  .   -
chr2    1929    1929    SITE    .   +
  • cut : removes columns from each line of files

E.g. if you want to select the name and the score columns of the file above:

# keeped the column 1 to 3 and 6
cut -f 1-3,6 myf.tsv
chr1    7517    7517    -
chr10   1826    1826    -
chrX    1494    1494    -
chrY    3470    3470    +
chr11   3054    3054    -
chr2    1929    1929    +

If the separator is not a \t (tabulation), you can change it with the option -d.

Sort a file

  • sort : sort lines of text files
# for help on sort usage
man sort

E.g. if you want to sort the previous file by chromosome (first column in myf.tsv):

# numeric               |  # alphanumeric         |   # version
# sort -k1,1n myf.tsv   |  # sort -k1,1d myf.tsv  |   # sort -k1,1V myf.tsv
chr10    1826  1826 -   |  chr1   7517  7517  -   |   chr1   7517  7517  -
chr11    3054  3054 -   |  chr10  1826  1826  -   |   chr2   1929  1929  +
chr1     7517  7517 -   |  chr11  3054  3054  -   |   chr10  1826  1826  -
chr2     1929  1929 +   |  chr2   1929  1929  +   |   chr11  3054  3054  -
chrX     1494  1494 -   |  chrX   1494  1494  -   |   chrX   1494  1494  -
chrY     3470  3470 +   |  chrY   3470  3470  +   |   chrY   3470  3470  +

Input, Output, and Error Streams

Some commands work on the basis of data either typed in by the user or written in a file. It is referred to as the standard command input stream or stdin.

Similarly, some commands provide data either displayed on screen (e.g. the ls command) or in a file. It is referred to as the standard command output stream or stdout.

There is a 3rd standard stream, which is the standard error stream or stderr. Under Unix, a command that ends without an error returns “0”, otherwise, it returns a number indicating the error code, or an explanatory sentence. By default, the error stream is also displayed on the screen (like the output stream).

Standard streams (arrows)

Stream redirections

When the result of a command is of interest to a subsequent question/command, it can be transformed into a file rather than displayed on the screen, so that the next command can read this file as input data.
This is known as stream redirection.

E.g. below, the output file of command 1 becomes the input file of command 2, thanks to the pipe redirection operator indicated in red (it will be the | character in the command line*):

redirection with pipe
  • Some other redirection operators:
    • > myfile.txt: stores the stdout stream by creating (or overwriting) the myfile.txt file
    • >> myfile.txt: stores the stdout stream by adding lines to the myfile.txt file

*|: Shift+\ on Mac keyboards, AltGr+6 on Windows

Redirection examples

Example 1 - Count the number of files

For example, to count the number of files (assuming the filenames all have an extension) you can use the ls command followed by the wc command:

ls *.* | wc -l

There’s no limit (apart from human understanding) to the number of pipe redirections you can associate.

Example 2 - Count number of files per user

Here is how to count the number of files created by each user in a shared project: from the list of items (with ls), you can extract (with cut) the user column (that starts at around the 14th character), sort them (with sort) and then count them with the uniq command and its -c option (uniq lists single lines):

ls -lah *.* | cut -c 14-20 | sort | uniq -c

Example 3 - File inventory

It is possible to create a new file (named my_txt_files.txt) containing a list of all files with a special filter, for example, the extension .txt :

ls *.txt > my_txt_files.txt

Best practices

File naming!

  1. Avoid spaces (it’s possible but it’s encouraged to use “_” instead)
  2. Keep concise (<30 characters if possible)
  3. Use ISO 8601 formatted dates (YYYYMMDD or YYYY-MM-DD)
  4. Avoid special characters, such as: é è ç ~ ! @ # $ % ^ & * ( ) ; : < > ? . , { } ’ ” |

Example dealing with file names with spaces

emiliedrouineau@is152868-2:~$ ls
'file name.txt'
emiliedrouineau@is152868-2:~$ cat file name.txt
cat: file: No such file or directory
cat: name.txt: No such file or directory

The terminal raised a cat error because it didn’t understand the fact that “file name.txt” was a unique argument. Spaces are commonly argument separators.
Thus, it reads your input as 2 separate files: file and name.txt, which don’t exist.

It’s possible to use spaces by escaping the character with a backslash (file\ name.txt) or using quotes ("file name.txt") but it’s messy and can be a source of future errors.

Warning with Windows/MacOS files

Watch out for hidden characters when you edit files with Windows or MacOS. Linux can be picky…

For example, although my script looks ok, I get an error when I run it:

# run the script on linux
emiliedrouineau@is152868-2:~$ ./my_script_windows.sh
bash: ./my_script_windows.sh: /bin/bash^M: bad interpreter: No such file or directory

What is /bin/bash^M? I don’t see it in my file??!

The -A option of cat will help you see all invisible characters:

emiliedrouineau@is152868-2:~$ cat -A ./my_script_windows.sh
#!/bin/bash^M$
# revomed score column from my file^M$
cut -f 1-3,6 exoBed.bed > filter_exoBed.bed^M$
# sorted file by chromosome name and start coordinates^M$
sort -k1,1V -k2,2n filter_exoBed.bed > sort_filter_exoBed.bed^M$

Here, the end of line is not correctly encoded for linux (^M).

/!\ No Excel or Word formats!!!!!!! Stick to simple text formats
/!\ Avoid copy-pasting code from the web (hidden characters = danger)

Conclusion

Now you know:

  • how to navigate the file system in the unix world
  • several commands to access file contents
  • that a succession of bash commands can be used to compose more complex tasks

It’s a good start.
But there’s more to the Unix world than that!

  • we’ve only mentioned the most common bash commands, but there are many more (a useful cheat sheet of basic Unix commands)
  • you can also design your own commands - this is called programming. With a programming language: bash, but there are many others (python, R, C, etc.)
  • you can also install programs written by others. These are often referred to as packages

The door is open: welcome!