1 Introduction

This webpage provides tutorials introducing remote computing and command-line tools commonly used in genomics and bioinformatics. The tutorials focus on practical workflows using Linux computers running the Ubuntu distribution.

The material presented here is especially relevant for:

These tutorials will help you develop the computational skills required to connect to remote computers, manage long analyses, and navigate the command-line environment used in genomics research.

2 Learning Outcomes

After completing these tutorials, you will be able to:

  1. Remotely access Linux computers using the ssh protocol.
  2. Safely run long analyses (aka computer jobs) using tmux sessions.
  3. Use common Bash command-line tools to organize files and perform genomics analyses.

3 Structure

These tutorials are organized to guide you through the essential steps required to work on remote Linux computers and conduct genomics analyses using the command line.

The material is structured into the following sections:

  1. Remotely access Linux computers:
    In this section, you will learn how to connect to remote computers using the ssh protocol and understand the basic concepts required for secure remote access.

  2. Safely run computer jobs:
    This section introduces the tmux tool, which allows you to run and manage long computational jobs in persistent terminal sessions.

  3. Bash command-line reference:
    This section provides a practical overview of commonly used Bash commands to navigate the file system, manage files, and support genomics analyses.

4 Your Computer Accounts

Details on your Linux computer accounts and their IP addresses are summarized in this Google document.

Your assigned user IDs for each group are provided in this spreadsheet.

5 Remotely Access Computers

To remotely access your assigned Linux computer accounts using the ssh protocol, you need the following information:

  1. Your user ID and password (see the Google document).
  2. The IP address of the computer you want to connect to.

If you want to remotely access the lab computers from a Windows machine, please read the protocol described below before continuing.

Important: If you are accessing the Linux computers outside of the BSU campus network, you will need to connect through the BSU VPN.

5.1 What is an IP Address?

An Internet Protocol (IP) address is a numerical label assigned to each device connected to a computer network that uses the Internet Protocol for communication.

An IP address serves two main functions:

  1. Network interface identification
  2. Location addressing

5.2 How to Find the IP Address?

The procedure summarized here shows how to retrieve a computer’s IP address on the Ubuntu operating system (Figure (ref?)(fig:IP)):

  1. Open the System Settings application from the desktop sidebar.
  2. Select the Network tab located under the Hardware section.
  3. Select the Wired tab.
  4. The computer’s IP address appears under the IPv4 Address field.
Screenshot of Ubuntu desktop showing how to retrieve the IP address.

Figure 5.1: Screenshot of Ubuntu desktop showing how to retrieve the IP address.

5.2.1 Command Lines to Get the IP Address

You can also retrieve the IP address from the command line by typing the following commands in a Terminal:

# Get IP address
hostname -I | awk '{print $1}'
ip a
ifconfig -a

5.3 The ssh Protocol

The Secure Shell (ssh) protocol is a method used for secure remote login from one computer to another (Figure (ref?)(fig:sshFig)). It protects communication using strong encryption and allows users to remotely execute commands on another machine.

Overview of ssh protocol.

Figure 5.2: Overview of ssh protocol.

Once you have gathered the required information (IP address and username), open a Terminal and type:

# General command
ssh USER_ID@IP

# Example
ssh bioinformatics@132.178.143.53
Example of an ssh connection from a Terminal.

Figure 5.3: Example of an ssh connection from a Terminal.

5.4 Install Putty on Windows

If you are using Windows, you can use Putty to establish SSH connections.

Download the software here: http://www.putty.org

When launching Putty:

  1. Enter user@IP in the Host Name field.
  2. Set Port to 22.
  3. Select SSH as the connection type.
Putty configuration window.

Figure 5.4: Putty configuration window.

5.5 Exercises

  1. Retrieve your user ID and IP address.
  2. Connect to your account using ssh.
  3. Exit the remote session using exit.

6 Safely Run Computer Jobs

Some analyses performed in genomics can take several hours or even days to complete. To avoid interrupting these analyses, it is important to run them in a persistent terminal environment.

The tool we will use for this purpose is tmux, which allows multiple terminal sessions to run simultaneously.

6.1 The tmux protocol

tmux is a terminal multiplexer for Unix-like systems. It allows users to create multiple independent terminal sessions within a single window.

Key advantages:

  • Run long analyses safely
  • Disconnect from a remote computer without stopping jobs
  • Manage multiple command-line tasks simultaneously

6.1.1 Install tmux on Mac

brew install tmux

6.2 How to Operate tmux Sessions

6.2.1 Create a New Session

tmux

6.2.2 Detach from a Session

Press:

Ctrl+b then d

6.2.3 Reattach to a Session

tmux attach

6.2.4 Create and Name a Session

tmux new -s JOB1

6.2.5 List Sessions

tmux list-sessions

6.2.6 Attach to a Specific Session

tmux attach -t JOB1

6.2.7 Kill a Session

tmux kill-session -t JOB1

Warning: Killing a session will terminate any running analysis.

7 Bash Command-Line Reference

Below is a collection of common Bash commands used to navigate the file system and manage computational analyses.

7.1 File System

  • ls — list files in a directory
  • cd — change directory
  • pwd — print working directory
  • mkdir — create a directory
  • rm — remove a file
  • cp — copy files
  • mv — move or rename files
  • cat — display file contents
  • less — view files page by page
  • head — show the first lines of a file
  • tail — show the last lines of a file

7.2 System

  • whoami — display current user
  • date — display date and time
  • exit — exit terminal
  • df -h — disk usage
  • free — memory usage

7.3 Process Management

  • ps — list active processes
  • top — show running processes
  • htop — interactive process viewer
  • kill PID — terminate process

7.4 Networking

  • ssh user@host — connect to remote host
  • scp — transfer files between machines
  • wget — download files
  • curl — download files

7.5 Searching

  • grep — search text patterns
  • find — locate files
  • sed — text substitution

7.6 Compression

  • tar — archive files
  • gzip — compress files

8 Exercises

To practice the skills introduced in this tutorial:

  1. Connect to your account using ssh.
  2. List the contents of your home directory using ls.
  3. Navigate to Documents/ using cd.
  4. Create a directory named My_jobs.
  5. Enter the directory.
  6. Create a file named README.txt using vim.
  7. Add the text Hello! to the file.
  8. Verify the file exists using ls -alh.
  9. Delete the file using rm.
  10. Remove the directory using rm -r.
  11. Exit your remote session using exit.