This webpage provides tutorials introducing remote computing and command-line tools commonly used in genomics and bioinformatics. The tutorials focus on practical workflows using Linux computers running the Ubuntu distribution.
The material presented here is especially relevant for:
These tutorials will help you develop the computational skills required to connect to remote computers, manage long analyses, and navigate the command-line environment used in genomics research.
After completing these tutorials, you will be able to:
ssh protocol.tmux sessions.These tutorials are organized to guide you through the essential steps required to work on remote Linux computers and conduct genomics analyses using the command line.
The material is structured into the following sections:
Remotely access Linux computers:
In this section, you will learn how to connect to remote computers using the ssh protocol and understand the basic concepts required for secure remote access.
Safely run computer jobs:
This section introduces the tmux tool, which allows you to run and manage long computational jobs in persistent terminal sessions.
Bash command-line reference:
This section provides a practical overview of commonly used Bash commands to navigate the file system, manage files, and support genomics analyses.
Details on your Linux computer accounts and their IP addresses are summarized in this Google document.
Your assigned user IDs for each group are provided in this spreadsheet.
To remotely access your assigned Linux computer accounts using the ssh protocol, you need the following information:
If you want to remotely access the lab computers from a Windows machine, please read the protocol described below before continuing.
Important: If you are accessing the Linux computers outside of the BSU campus network, you will need to connect through the BSU VPN.
An Internet Protocol (IP) address is a numerical label assigned to each device connected to a computer network that uses the Internet Protocol for communication.
An IP address serves two main functions:
The procedure summarized here shows how to retrieve a computer’s IP address on the Ubuntu operating system (Figure (ref?)(fig:IP)):
System Settings application from the desktop sidebar.Network tab located under the Hardware section.Wired tab.
Figure 5.1: Screenshot of Ubuntu desktop showing how to retrieve the IP address.
You can also retrieve the IP address from the command line by typing the following commands in a Terminal:
# Get IP address
hostname -I | awk '{print $1}'
ip a
ifconfig -a
ssh ProtocolThe Secure Shell (ssh) protocol is a method used for secure remote login from one computer to another (Figure (ref?)(fig:sshFig)).
It protects communication using strong encryption and allows users to remotely execute commands on another machine.
Figure 5.2: Overview of ssh protocol.
Once you have gathered the required information (IP address and username), open a Terminal and type:
# General command
ssh USER_ID@IP
# Example
ssh bioinformatics@132.178.143.53
Figure 5.3: Example of an ssh connection from a Terminal.
If you are using Windows, you can use Putty to establish SSH connections.
Download the software here: http://www.putty.org
When launching Putty:
user@IP in the Host Name field.22.
Figure 5.4: Putty configuration window.
ssh.exit.Some analyses performed in genomics can take several hours or even days to complete. To avoid interrupting these analyses, it is important to run them in a persistent terminal environment.
The tool we will use for this purpose is tmux, which allows multiple terminal sessions to run simultaneously.
tmux protocoltmux is a terminal multiplexer for Unix-like systems.
It allows users to create multiple independent terminal sessions within a single window.
Key advantages:
tmux on Macbrew install tmux
tmux Sessionstmux
Press:
Ctrl+b then d
tmux attach
tmux new -s JOB1
tmux list-sessions
tmux attach -t JOB1
tmux kill-session -t JOB1
Warning: Killing a session will terminate any running analysis.
Below is a collection of common Bash commands used to navigate the file system and manage computational analyses.
ls — list files in a directorycd — change directorypwd — print working directorymkdir — create a directoryrm — remove a filecp — copy filesmv — move or rename filescat — display file contentsless — view files page by pagehead — show the first lines of a filetail — show the last lines of a filewhoami — display current userdate — display date and timeexit — exit terminaldf -h — disk usagefree — memory usageps — list active processestop — show running processeshtop — interactive process viewerkill PID — terminate processssh user@host — connect to remote hostscp — transfer files between machineswget — download filescurl — download filesgrep — search text patternsfind — locate filessed — text substitutiontar — archive filesgzip — compress filesTo practice the skills introduced in this tutorial:
ssh.ls.Documents/ using cd.My_jobs.README.txt using vim.ls -alh.rm.rm -r.exit.