In this exercise you shall work on a brief introduction to the UNIX/Linux system. It is expected that you all have some prior knowledge about programming, so the introduction is very short. In the first part of the exercise you shall set-up your account and copy some essential files needed to make the programs during the course. Next, you shall go through some small exercises giving you a more detailed introduction to Unix/Linux.
Open a terminal window under Linux.
Make a course directory (say Algo) some place. We will next do all the course exercises and store all course related files in this directory.
Now, download the file
Open the file (using tar -xzvf data.tar.gz) and place the created data directory in the "Algo" directory.
Now we are ready to start. Do the following exercise in a terminal window (MobaXterm or similar if you are a windown user)
Basic commands
Where am I? - pwd
pwd      This command returns the path to your current location (the current directory)
(and this the command that is used to construct your prompt)
Make a new directory - mkdir
Examples:
mkdir test mkdir -p data
Make directories test. The -p option gives no error if the directory already exists.
What is in this directory? - ls
Examples:
ls      short listing of current directory (a directory is often called a folder in windows)
ls ..      short listing of directory above (parent to) current directory - ".." means one directory up "../.." is two directories up
ls data      short listing mail directory (equivalent to ls ./data - "." means here)
ls -l data      detailed listing of projects directory
ls -ltr data      long listing sorted by time (t) and reversed (r): newest files last (essential for old bioinformaticians who can not remember what they just did)
ls /usr/bin/      list programs in "/usr/bin/" directory.
paths starting with "/" are absolute addresses starting at the root folder (normally called C:\ in windows) -
as opposed to relative addresses (addresses relative to where you are in the folder hieraki)
I want to go to? - cd
The cd      command is used to move around in the file system.
Examples:
cd ..      up one level
cd /usr/local/bin/      go to absolute (not relative) address
cd      go to my home directory
Copying (more) files - cp
cd test cp ../data/Intro/test.dat . cp ../data/Intro/* . cp -R ../data/Intro .
Examples:
man ls      gets help to the ls command
Go to the test directory
cd testMake a new directory called mydirectory
mkdir mydirectory
Make an empty file (or update time stamp on existing one)
touch myfile
Makes a file called "myfile". (verify it has been created with ls -l)
Moving files - mv
Examples:
mv myfile mynewfile      move myfile to mynewfile
Removing (deleting) files - rm
rm mynewfile      remove mynewfile
rmdir mydirectory      remove an empty directory
rm -rf mydirectory      remove my directory, including files and subdirectories - no questions asked - make sure this is what you want to do, there is no recycle bin on UNIX; once it is gone it is gone
Viewing files - cat/more/less/head/tail
Examples:
cat test.dat      write contents of file to screen
head test.dat      write top of file (default 10 lines)
head -30 test.dat      write top 30 lines of file
tail test.dat      Print the last 10 lines of end of file
more test.dat      test.dat, pres "space to go one page down, "q" to quit.
less test.dat      test.dat, pres "space to go one page down, "j" to go one line down, "k" to go one line up"q" to quit.
Editing files - gedit/vi
The gedit      command is used to launch the gedit editor.
Examples:
gedit test.dat      edit the file test.dat with gedit
vi is a nerdy editor.
Type:
vi test.dat      to edit the file test.dat.
/RLM to search for "RLM"
x to delete a letter
dd to delete a line
5 dd to delete 5 lines
:q! to get out without changing anything, or
"ZZ" to save changes and quit.
To insert text press "i" - to get into insert mode and press "Esc" to get out of insert mode
(in all "normal" editors you are automatically in insert mode). You can use "R" and "Esc" to get in and out of replace (overwrite) mode
You may not want to use the vi editor unless you have to e.g. if you can not run x-windows, or edit via a noisy telephoneline from Mars.
Moving data around
Redirecting: |><      
Use | to "pipe" data from one program to another. Example:
cat test.dat | wc      pipe the contents of test.dat into the program called wc (word count) count number of lines, words and bytes in test.dat
Use > to direct data to a file (and overwrite it). Example:
head test.dat > tmp.dat     Put first ten lines of test.dat into tmp.dat
Use >> to direct data to a file and append the data to the contents of the file. Example:
head test.dat >> tmp.dat     Put first ten lines of test.dat into tmp.dat (now it should contain 20 lines)
Use < to get data from a file to a program. Example:
head < test.dat      
Sort file - sort
Example of using sort:
sort -n test.dat      sort file
sort -n -k2 test.dat      sort file numerically (big numbers last) after 2rd column
sort -r -n -k2 test.dat      sort file reverse numerically after 2rd column
sort -u test.dat      Keep only one copy of each unique line
sort test.dat | uniq -c      Keep only one copy of each unique line and count number of duplicates for each entry
Concatinate side by side - paste
Example:
paste test.dat test.dat     
Get lines matching a patern - grep/egrep
Example:
grep AAA test.dat      Get lines with "AAA"
grep -v AAA test.dat      Get lines that do not contain "AAA"
grep ^AAA test.dat      Get lines starting with "AAA"
grep ".L......V" test.dat      Get lines matching something ("." is a wildcard) "L" six times something and "V"
grep ".[L,V,I]......[V,L]" test.dat      Get lines matching something ("." is a wildcard) "L, V or I" six times something and "V or L"
Awksome programing languages (awk, nawk & gawk)
awk, nawk & gawk are different versions of the same programming language,
and are very similar. It is
recommended to use gawk or nawk, rather than the original version: awk, since
they are more stable and have more features!
Basically gawk will read a file and do something with each line.
Examples of using gawk:
gawk '{print $1}' test.dat       Print first field in file
gawk '{print $1, $2}' test.dat       Print first and second field in file
gawk '{print $0}' test.dat       Print entire line
gawk '{print substr($1,2,5),$0}' test.dat       Print characters 2-6 from first field and complete line in file
echo "Mary had a little lamb" |gawk '{line = $0; gsub (" ","",line);print line}'      Remove all spaces in all lines
gawk -v name=Mary -v animal=lamb '{print name,$1,animal}' test.dat      Passing variables to gawk
echo "THIS+IS+A+SENTENCE+SPLIT+BY+PLUS" | gawk -F "+" '{print $1,$2,$3,$4,$5,$6,$7}'      Split only input on "+" (rather than on any whitespace as is the default)
echo "THIS+IS+A+SENTENCE+SPLIT+BY+PLUS" | gawk -F "+" '{for ( i=1;i<NF;i++ ) { printf( "%s ", $i)}printf( "%s\n", $NF)}'
A more elegant way of doing the same.
What did I do
history      The history command gives the old commands
Geting help - man
The man      gives help to most unix commands.
Examples:
man ls      get help to the ls command
Other usefull commands
which cat      Find out where a program (the cat program in this case) is installed. Often when you edit a program and nothing happens it is because you are editing another program than the one you are running
tar      Pack and unpack files
gunzip      Unzip a zipped file (.gz files)
diff/gdiff      compare two files
chmod      Change permissions
ownership chown      Change ownership
command line options      most unix programs take options in the form "program -option". for example head -5 will print out the first 5 lines of a file
autocompletion      Press TAB to let the unix system complete a file/program name
"arrow up"      press arrow up to get old commands
CTRL a      Go to start of line
CTRL e      Go to end of line
If you have more time, you can play a bit with GAWK. Most of the time doing research in bioinformatics is spend transforming data from one output format into another. For doing this, GAWK is a very powerful tool. Here is one example of such a task.
Go to the test directory. Here you will have a file called 1A68_HUMAN.sprot. This file contain a protein sequence in the Swissprot format. You can see the content by typing
cat 1A68_HUMAN.sprot | gawk '{print $0}' | more
Note, that the gawk command prints everything out, i.e., it does nothing.
Your job is to rewrite the gawk command so that it writes out the SWISSPROT entry in fasta format, ie. a format like
>1A68_HUMAN MAVMAPRTLVLLLSGALALTQTWAGSHSMRYFYTSVSRPGRGEPRFIAVGYVDDTQFVRF DSDAASQRMEPRAPWIEQEGPEYWDRNTRNVKAQSQTDRVDLGTLRGYYNQSEAGSHTIQ MMYGCDVGSDGRFLRGYRQDAYDGKDYIALKEDLRSWTAADMAAQTTKHKWEAAHVAEQW RAYLEGTCVEWLRRYLENGKETLQRTDAPKTHMTHHAVSDHEATLRCWALSFYPAEITLT WQRDGEDQTQDTELVETRPAGDGTFQKWVAVVVPSGQEQRYTCHVQHEGLPKPLTLRWEP SSQPTIPIVGIIAGLVLFGAVITGAVVAAVMWRRKSSDRKGGSYSQAASSDSAQGSDVSL TACKV
This is all for now