Unix
Previous: None | Next: Python Recap and Objects |
You did do the Course preparation, right!! Otherwise all of this matters not.
Required course material for the lesson
Powerpoint: Introduction to Unix
The videos are not entirely in sync with the powerpoint, as it has been updated.
Video: Unix intro and navigation Monday
Video: Coping, Moving, Renaming files. Changing permissions Monday
Video: Using shortcuts in Unix - making it easier Monday
Video: File inspection and editor in unix. 1½ min silence at the end Monday
Video: Manipulating files: wc, paste, cut & sort Monday
Video: Manipulating files: grep and pipelines Monday
Video: Touching upon various relevant subjects Monday
Resource: The resource on Unix for many courses at Health Tech
Resource: Biological knowledge needed in the course
Resource: UNIX Tutorial for Beginners from University of Surrey
Resource: UNIX/LINUX Tutorial from TutorialsPoint
Subjects covered
- Basic file handling in UNIX.
- Understanding and navigating the file system.
- Many different Unix commands
- Scripting in bash
Exercises to be handed in
Use a text editor to to create a file mycommands.txt where you write all commands and observations you do in the following exercises. Use copy/paste to copy the commands.
Note: Make sure that we can easily see which exercise you attempt to solve.
Note: You should work from your home directory, or a work directory that you specify explicitly in the top of your hand-in.
Note: Data files can be found in the collection of files.
Exercises to do after Monday
It is assumed that you have downloaded the 3 files; ex1.dat, ex1.acc and orphans.sp before you start the exercises
- First list the files in the directory.
- Copy ex1.acc to myfile.acc.
- Look at the content of both files to ensure they are identical.
- Copy ex1.dat to myfile.acc.
- Check that the content of myfile.acc changed.
- Delete myfile.acc.
- Make a directory test and move the three files to it.
- Make a directory data and move the three files to that instead.
- Remove test' directory.
- Change directory to data and confirm that you succeeded. Go back to the home/work directory afterwards.
- Make three new directories newtest - one inside the other, like a Russian doll.
- Move the data directory to the innermost newtest directory.
- Confirm that the three files are moved along with the data directory.
- Copy the three files to your home (your top directory).
- Remove all newtest directories and data in the with a single command.
- Count the lines in ex1.acc and ex1.dat.
- Concatenate ex1.acc and ex1.dat in the file ex1.tot, i.e. copy the content of two files into one new file. Verify that all gene IDs comes first followed by numerical data.
- Merge/Paste ex1.acc and ex1.dat together in ex1.tot, thus destroying the old file. Verify that corresponding gene IDs and numerical data are put on the same line as the data.
- Extract (cut) SwissProt ID and 3nd numerical data (column 1 and 5) from ex1.tot. Put results into a file ex1.res.
- Find the 3 SwissProt ID's in ex1.res which have the largest number(s) in column 2, i.e. the top 3 entries. Display only the ID's.
- Find the lines (using grep) in orphans.sp which contain a GenBank accession number. There are 85, verify this. Note: An accession number is one or two capital letters and looks like this 'AB000114.CDS.1', i.e. Some letters followed by some numbers. The .CDS. part is kind of optional.
- How many human genes with SwissProt IDs in orphans.sp exist ? How many of those are hypothetical ? (11)
- How many genes belong to the rat, and how many of those are precursors ? (9) Note: A Swissprot ID looks like 'PARG_HUMAN' or 'TF1A_MOUSE', with the gene being before the underscore and the organism after the underscore.
- From the file ex1.res find the lines with positive numbers and put then into ex1.pos. The lines with negative number go into ex1.neg.
- Calculate ((356+51)*123-12765)/56 on the command line.
Exercises to do after Thursday
- Go back to exercise 2 & 3. Check if the files are changed, using diff this time.
- In the myfile.acc (copy of ex1.acc) change all occurrences of SPC to BLNK, using sed. Check if the files are changed, using diff.
- What is the path to the bash command?
- Write a bash shell script that solves exercise 19-24, with the exercises clearly separated and explained in both the script and the output. "42" is unclear, but "Number of genes: 42" is clear. This should be straight forward (but long), especially since you wrote down what you did.
- Write a bash shell script that puts all the positive numbers in the file ex1.dat into a file ex1.pos2, and all the negative numbers into a file ex1.neg2. Column position does not matter. The script must clean up after itself, so if any temporary files are used, they must be deleted as the last action. Put the date and a description of the files in the first lines of the resulting output files.
- Write a bash shell script that calculates the total number of lines for all files in the directory mentioned on the command line as argument. No argument means current directory. Misleading hint: cut can split on other than tab.