1 Introduction

This webpage contains several tutorials providing introductions to different facets of the Linux operating system here focusing on the Ubuntu distribution. Information presented here are especially relevant for Chapter 4 and the Lab assignment.

2 Learning outcomes

Learn procedure to remotely access linux computers using the ssh protocol.
Learn procedure to safely run computer jobs using the tmux protocol.
Learn Linux commands to conduct genomic analyses.

3 Your computer accounts

Details on your Linux computer accounts and their IP addresses are summarized in this Google document. Your assigned user ID are provided in this spreadsheet.

4 Remotely access Linux computers

To remotely access your assigned Linux computer accounts using ssh protocol, you need to know:

Your user ID and password (see Google document).
The IP address of the computer you want to connect to. If you want to remotely access the lab computers from a Windows machine, please read the protocol described below before going any further into this document.

Warning: If you are accessing the Linux computers outside of BSU campus, you will need to use the VPN protocol.

4.1 What is an IP address?

An Internet Protocol address is a numerical label such as 192.0.2.1 that is connected to a computer network that uses the Internet Protocol for communication. An IP address serves two main functions:

Network interface identification
Location addressing

4.2 How to find the IP address?

The procedure summarized here allows retrieving the computer IP address on the Ubuntu operating system (16.04 LTS; Figure 4.1):

Open the System Settings app located on the left side bar of the desktop.
Select the Network tab located under the Hardware section.
Select the second Wired tab (related to WiFi connection) located on the left side of the window.
The computer IP address is found under the IPv4 Address argument located on the left side of the window.

Figure 4.1: Screenshot of Ubuntu desktop showing how to retrieve IP address.

4.2.1 Command lines to get IP address

On a Linux OS, you can get the IP address of the computer by typing the following commands in a Terminal:

#Get IP address
# Option 1
hostname -I | awk '{print $1}'
# Option 2
ip a
# Option 3 (also works on Mac)
ifconfig -a

4.3 The `ssh` protocol

The ssh protocol (also referred to as Secure Shell) is a method for secure remote login from one computer to another (Figure 4.2). It provides several alternative options for strong authentication, and it protects communications security and integrity with strong encryption.

Figure 4.2: Overview of ssh protocol. Credit: https://www.wallarm.com/what/what-is-ssh-protocol

Once you have gathered all required information (i.e., IP address and username) and installed Putty if you have a Windows operating system, please open a Terminal or command prompt window and type the following command to remotely access your account using the ssh protocol (see Figure 4.3):

# General command
$ ssh USER_ID@IP

# Remotely access computer from Group Z 
# You will also have to enter your password when prompted
$ ssh bioinformatics@132.178.143.53

Figure 4.3: Screenshot of Terminal showing commands to remotely access computer of group 3.

4.4 Install Putty on Windows

Putty is one of the best software used on Windows to remotely access computers. Download the program at this URL http://www.putty.org and follow instructions to install it on your computer.

When you launch Putty, a configuration window will pop up and you will enter i) your credentials (e.g. user@IP) in the Host Name (or IP address) box, ii) set the Port to 22 and iii) make sure the connection type is set to ssh (Figure 4.4). The full documentation for this software is available here.

Figure 4.4: Screenshot of Putty Configuration window to establish ssh connection.

4.5 Exercises

Retrieve your user ID and IP address
Remotely connect to your account using the ssh protocol
Exit your remote terminal session using exit

5 Safely run computer jobs

Some of the analyses that we will be conducting in this class will take several days to be completed. In this case, it is paramount to run those analyses in a safe environment. By safe, we mean making sure that analyses will not be inadvertently cut either by yourself or by other users (and trust me it happens!). In this section, we will be reviewing protocols implemented in tmux to run and access multiple parallel terminal sessions.

5.1 The `tmux` protocol

tmux is an open-source terminal multiplexer for Unix-like operating systems. It allows multiple terminal sessions to be accessed simultaneously in a single window. It is useful for running more than one command-line program at the same time. As long as the user knows the unique ID of their tmux session, the tmux session will be accessible by anybody and remain open after closing your ssh connection. For those reasons, the tmux protocol is very useful when multiple users are sharing the same account and/or want to work remotely.

This software is installed by default on the Linux operating system, but it can also be installed on Mac OS using the following command:

$ brew install tmux

5.2 How to operate `tmux` sessions

In this section, we will learn how to create, exit, rename, list, access, and terminate tmux sessions.

5.2.1 Create a new session

To create a new tmux session using the Terminal type:

$ tmux

5.2.2 Exit a new session

Once you have started your job, you can safely exit the tmux session without killing your job by typing the following combination of keystrokes:

Ctrl+b and d

At this point, you could safely log off of your remote session (by typing exit in the Terminal), while knowing that your analysis would keep running safely inside the tmux session.

5.2.3 Access your session

To access a tmux session and check the status of your analysis, do it as follows:

$ tmux attach

5.2.4 Name your session

The command executed above implies that there is only one tmux session running on your computer. If you intend to run multiple side-by-side sessions, you should name each session using the following command:

# Create the session
$ tmux 

# Rename the session for future access (type this combination of keystrokes)
Ctrl+b and $

One you have executed the commands, you will be able to rename your session as wished and when you are finished type Return or Enter to complete the task (Figure 5.1).

Figure 5.1: Screenshot of Terminal showing procedure to rename a tmux session (here JOB1).

5.2.5 Create and rename a session

You can also create a new session and directly name it as follows:

#General command
$ tmux new -s session_name

#Example 
$ tmux new -s JOB2

5.2.6 List all sessions

You can get a list of all running tmux sessions by using the following command (5.2):

$ tmux list-sessions

Figure 5.2: Screenshot of Terminal showing tmux command to access a specific session, here JOB1.

5.2.7 Access a specific session

To access a tmux session type the following command (Figure 5.2):

#General command
$ tmux attach -t ID_Session 
#To access JOB1 do
$ tmux attach -t JOB1

5.2.8 Terminate a session

You can terminate a tmux session by accessing it using commands provided above and then type the inline exit command. This will bring you back to the main Terminal and show that your tmux session was terminated as shown in Figure 5.3.

Screenshot of Terminal showing tmux command to access a specific session (here JOB1) and the result after terminating the session (using the exit command).

Figure 5.3: Screenshot of Terminal showing tmux command to access a specific session (here JOB1) and the result after terminating the session (using the exit command).

5.2.9 Kill a specific session

Finally, you can also kill a session without accessing it as follows:

#General command
$ tmux kill-session -t ID_Session

#To kill JOB2 do
$ tmux kill-session -t JOB2

Warning: Apply caution when applying this procedure since it will kill the tmux session and job with limited abilities to recover ongoing data.

5.3 Exercises

Please do the following short exercises on your assigned computer account (see ) to get accustomed with the material presented in this document.

Retrieve your user ID and IP address
Remotely connect to your account using the ssh protocol
Create a new tmux session and rename it Bio1
Access the Bio1 tmux session
Terminate the Bio1 tmux session and confirm that it has been properly terminated (using a learned tmux command)

6 Linux commands

Please find below common Linux commands. These commands will help you navigate in the Terminal and support your genomics analyses.

6.1 File System

ls — list items in current directory.
ls -l — list items in current directory and show in long format to see permissions, size, and modification date.
ls -a — list all items in current directory, including hidden files.
ls -F — list all items in current directory and show directories with a slash and executable with a star.
ls dir — list all items in directory dir.
cd dir — change directory to dir.
cd .. — go up one directory.
cd / — go to the root directory.
cd ~ — go to to your home directory.
cd - — go to the last directory you were just in.
pwd — show present working directory.
mkdir dir — make new directory called dir.
rm file — remove file.
rm -r dir — remove directory dir recursively.
cp file1 file2 — copy file1 to file2.
cp -r dir1 dir2 — copy directory dir1 to dir2 recursively.
mv file1 file2 — move (rename) file1 to file2.
mv file1 ~/file1 — move file1 from the current directory to user’s home directory.
cat file — output the contents of file.
less file — view file with page navigation.
head file — output the first 10 lines of file.
tail file — output the last 10 lines of file.
tail -f file — output the contents of file as it grows, starting with the last 10 lines.
vim file — edit file using vim text editor.

6.2 System

shutdown — shut down machine.
reboot — restart machine.
exit — exit terminal session.
date — show the current date and time.
whoami — who you are logged in as.
man command — show the manual for command.
df — show disk usage. Use df -h to have in human readable format. du — show directory space usage.
free — show memory and swap usage.
whereis app — show possible locations of app.
which app — show which app will be run by default.

6.3 Process Management

ps — display your currently active processes.
top — display all running processes.
htop - display all running processes and CPU usage. kill PID — kill process id PID. Use top to identify PID.
kill -9 PID — force kill process id PID.

6.4 Permissions

ls -l — list items in current directory and show permissions.
chmod ugo file — change permissions of file to ugo - is the user’s permissions, is the group’s permissions, and is everyone else’s permissions. The values of , , and can be any number between 0 and 7.

7 — full permissions.
6 — read and write only.
5 — read and execute only.
4 — read only.
3 — write and execute only.
2 — write only.
1 — execute only.
0 — no permissions.

Examples of chmod settings:

chmod 600 file — you can read and write - good for files.
chmod 700 file — you can read, write, and execute - good for scripts.
chmod 644 file — you can read and write, and everyone else can only read - good for web pages.
chmod 755 file — you can read, write, and execute, and everyone else can read and execute - good for programs that you want to share.

6.5 Networking

wget file — download a file. It can also be a file deposited on a website.
curl file — download a file.
scp user@host:file dir — secure copy a file from remote server to the dir directory on your machine.
scp file user@host:dir — secure copy a file from your machine to the dir directory on a remote server.
scp -r user@host:dir dir — secure copy the directory dir from remote server to the directory dir on your machine.
ssh user@host — connect to host (IP address) as user.
ping host — ping host and output results.
whois domain — get information for domain.
lsof -i tcp:1337 — list all processes running on port 1337.

6.6 Searching

grep pattern files — search for pattern in files.
grep -r pattern dir — search recursively for pattern in dir.
grep -rn pattern dir — search recursively for pattern in dir and show the line number found.
grep -r pattern dir --include='*.ext — search recursively for pattern in dir and only search in files with .ext extension.
find file — find all instances of file in real system.
locate file — find all instances of file using indexed database built from the updatedb command. Much faster than find.
sed -i 's/day/night/g' file — find all occurrences of day in a file and replace them with night - s means substitute and g means global - sed also supports regular expressions.

6.7 Compression

tar cf file.tar files — create a tar named file.tar containing files.
tar xf file.tar — extract the files from file.tar.
tar czf file.tar.gz files — create a tar with Gzip compression.
tar xzf file.tar.gz — extract a tar using Gzip.
gzip file — compresses file and renames it to file.gz.
gzip -d file.gz — decompresses file.gz back to file.

6.8 Exercises

To practice the material learned so far and gain confidence before pursuing our genomic journey, please do the following exercises:

Remotely connect to your account using ssh
List the content of the home directory (~) using ls
Navigate to Documents/ using cd
Create a new folder entitled My_jobs using mkdir and use ls to check that it was created.
Navigate in My_jobs/
Create a new text file entitled README.txt using vim. This will open a new window to edit your file
Edit your text file as follows:
- Type Shift and i to go into insert mode
- Type the following text: “Hello!”
- Exit the insert mode by hitting Esc
- Save and exit text file by typing this command: :wq!
- This bring you back to your folder
Check that your file exists and that it has a size (= not empty) by using ls -alh
- What is the size of README.txt in kilo bytes?
Delete README.txt using rm
Check that README.txt has been deleted
Go back to the home directory using cd ..
Remove My_jobs/ using rm -r
Check that My_jobs/ has been deleted
Exit your remote terminal session using exit

Tutorials

Bioinformatics toolkit for genomics

2024-03-06

1 Introduction

2 Learning outcomes

3 Your computer accounts

4 Remotely access Linux computers

4.1 What is an IP address?

4.2 How to find the IP address?

4.2.1 Command lines to get IP address

4.3 The `ssh` protocol

4.4 Install Putty on Windows

4.5 Exercises

5 Safely run computer jobs

5.1 The `tmux` protocol

5.2 How to operate `tmux` sessions

5.2.1 Create a new session

5.2.2 Exit a new session

5.2.3 Access your session

5.2.4 Name your session

5.2.5 Create and rename a session

5.2.6 List all sessions

5.2.7 Access a specific session

5.2.8 Terminate a session

5.2.9 Kill a specific session

5.3 Exercises

6 Linux commands

6.1 File System

6.2 System

6.3 Process Management

6.4 Permissions

6.5 Networking

6.6 Searching

6.7 Compression

6.8 Exercises

Tutorials

Bioinformatics toolkit for genomics

2024-03-06

1 Introduction

2 Learning outcomes

3 Your computer accounts

4 Remotely access Linux computers

4.1 What is an IP address?

4.2 How to find the IP address?

4.2.1 Command lines to get IP address

4.3 The ssh protocol

4.4 Install Putty on Windows

4.5 Exercises

5 Safely run computer jobs

5.1 The tmux protocol

5.2 How to operate tmux sessions

5.2.1 Create a new session

5.2.2 Exit a new session

5.2.3 Access your session

5.2.4 Name your session

5.2.5 Create and rename a session

5.2.6 List all sessions

5.2.7 Access a specific session

5.2.8 Terminate a session

5.2.9 Kill a specific session

5.3 Exercises

6 Linux commands

6.1 File System

6.2 System

6.3 Process Management

6.4 Permissions

6.5 Networking

6.6 Searching

6.7 Compression

6.8 Exercises

4.3 The `ssh` protocol

5.1 The `tmux` protocol

5.2 How to operate `tmux` sessions