The readings below are assigned through Canvas:
Linux Shell Skills from Prof. Shedden’s 2016 Course notes
A tmux Primer by Daniel Meissler
Statistics and Computation Service by UM ITS
If you are using a computer running a Mac or Linux OS you have access to a Linux terminal installed on your computer as Terminal.
If you are using a computer running a Windows OS, you will need to install a terminal or shell program capable of acting as such. Most people use Putty but Powershell may also be an option.
Experienced users may be interested in zsh but we will not make use of it for this class.
In order to connect to university Linux servers, you need to have an AFS home directory. If you do not have one, you can set it up by visiting http://mfile.umich.edu/ and selecting the ‘AFS Self-Provisioning Tool’.
You can connect to a UM Linux server using ssh
as follows:
ssh uniqname@login.itd.umich.edu
ssh uniqname@scs.dsc.umich.edu
ssh uniqname@mario.dsc.umich.edu
ssh uniqname@luigi.dsc.umich.edu
Replace uniqname with your UM unique name which is same as the first part of your UM email address.
If you have trouble connecting to the SCS servers please visit this help page.
scs.dsc.umich.edu
using ssh
. Which host were you connected to? Log out.login.itd.umich.edu
using ssh
. Which host were you connected to? Log out an connect again. Were you connected to the same host?In Linux essentially everything is a file: this includes program executable, system configurations, as well as your own data and source files.
Files are organized hierarchically into directories beginning with the root directory /
. Directories can contain files and sub-directories with locations in the directory hierarchy separated by a /
. This collection of directories and files is called a file tree.
Use the following commands to navigate and interact with the file tree:
ls
(list files), ls -a
, ls -l
cd
(change directories)pwd
(print the current or working directory)mkdir
(make directory), mkdir -p
rmdir
(remove directory)rm
(remove a file), rm -r
mv
Move a file or directoryfind
Find a file.In working with files, it is helpful to know:
.
refers to the current directory,..
refers to the parent directory, one step up the file tree.cd
invoked with no arguments will return you to your home directory..
. To see these files, use ls -a
.*
matches any sequence of characters?
matches any single character.Environment variables determine certain aspects of how the OS behaves and responds to your instructions. Here are a couple of important ones:
In the Bash shell, use $
to access the value of an environment variable. The echo
command can be used to print these values to the screen echo $SHELL
.
Use which
to search your $PATH
for an executable command.
A tilde ~
will often be expanded as $HOME
.
For more see the GNU Coreutils documentation.
In order to edit files in the shell, you will need to use a text editor. Some popular choices are:
You can find links to tutorials in Prof Shedden’s notes assigned as reading. I personally use emacs and that will be the editor I use in examples presented to the class.
If you do not already have a preferred text editor, please pick one to learn for the course, find a tutorial on it, and work through that tutorial.
A terminal multiplexer allows you to invoke multiple shells from the same terminal connection and to keep these sessions running after you log off. The two most common are screen
and tmux
, with the latter being the preferred option for this course.
When using a terminal multiplexer with a networked file system such as AFS, be aware that your credentials or “ticket” for accessing the networked files will typically expire after a fixed amount of time (e.g. 24 hours). You can renew this ticket for a fixed amount of time using kinit
:
kinit -4d
aklog
There are many ways to transfer data to a remote server using the shell.
Three common ways to do this from the command line are:
+scp
to copy to/from your local computer,
+wget
to download directly from the web,
+sftp
or ‘secure file transfer protocol’ for transferring large volumes of data.
To transfer a single smallish file from the working directory on your local machine to your AFS space:
scp ./local_file.ext uniqname@login.itd.umich.edu:~/remote_directory/
To transfer a file from the remote directory to your local computer reverse the arguments:
scp uniqname@login.itd.umich.edu:~/remote_directory/remote_file.ext ./
For larger transfers you should use sftp
for efficiency. Transfer data using the login pool to avoid adding strain to the computation servers.
To download data directly from a website to a remote server use a web browser to find the URL to the file and use wget
:
wget https://remote.url.edu/path/to/file/data.txt
Make sure you are only download from trusted sources!
Use sftp
for interactive sessions, read more using man sftp
on one of the University remotes.
Use one of the methods above to transfer the RECS data to your AFS space.
We will review some of the examples from sections 2.3.4 and 2.3.5 of Data Science at the Command Line. You may wish to read all of section 2.3.
Large files often contain redundant data and can be stored using less space on disk in a compressed format. Depending on the system and the file, compression can make reading from or writing to a file more efficient as reading the bits off disk is “I/O bound” while decoding/decompressing is “CPU bound”. This is particularly useful on shared systems with I/O bottlenecks.
The du
or disk utilization utility can be used to see the space on disk used by a file or set of files. Use the -h
option to print values in human readable units. Use -s
to get sum totals for a directory.
There are many compression tools, one of the most popular is gzip
. The command,
gzip file.txt
compresses file.txt
into file.gz
.
The file can be uncompressed using,
gunzip file.gz
or
gzip -d file.gz
the original extension is stored in the compressed file.
You can retain the compressed copy and unzip directly to standard output using the -c
option:
gunzip -c file.gz > file.txt
zcat
is a shortcut that does the same thing.
A tarball is an archive of a file tree and often compressed. This can be useful for transferring directories between machines manually. It is also a way to cleanly archive files from projects you would like to retain, but no longer need to use frequently. Many programs have the ability to work directly with archived and/or compressed data.
The two most common use cases are creating an archive,
tar cvfz name.tgz ./parent_folder
and extracting the archive,
tar xvfz name.tgz
The extension .tgz
is short for .tar.gz
indicating that the archive has been compressed using gzip
.
You may at times find the following command line tools useful:
head
- read the first n lines of a file
tail
- read the last n lines of a file
wc
- count words or use wc -l
to count lines
grep
- find lines in files that match string patterns
sort
- sort a file on one or more fields
cut
- extract select columns from a delimited file
paste
- concatenate files line by line
join
- merge two files based on a common field.
We will look at examples in class as time permits. You may wish to read chapter 5 of Data Science at the Command Line.