Unix Exercises
| Previous: None | Next: Python Recap and Objects |
You did complete the Course preparation, right? Otherwise, all of this will fail.
Required course material for the lesson
PowerPoint: Introduction to Unix The videos are not entirely in sync with the updated PowerPoint.
- Video: Unix intro and navigation Monday
- Video: Copying, Moving, Renaming files. Changing permissions Monday
- Video: Using shortcuts in Unix - making it easier Monday
- Video: File inspection and editor in Unix Monday
- Video: Manipulating files: wc, paste, cut & sort Monday
- Video: Manipulating files: grep and pipelines Monday
- Video: Touching upon various relevant subjects Monday
- Resource: The UNIX resource used across Health Tech courses
- Resource: Biological knowledge needed in the course
- Resource: UNIX Tutorial for Beginners
- Resource: UNIX/LINUX Tutorial
Subjects covered
- Basic file handling in UNIX
- Understanding and navigating the file system
- Many different Unix commands
- Scripting in bash
Exercises to be handed in
Use a text editor to create a file mycommands.txt where you write all commands and observations from the exercises below. Use copy/paste to insert the commands you ran.
Notes:
- Make sure it is clear which exercise you are solving.
- Work from your home directory unless you explicitly state otherwise.
- Data files are available in the collection of files.
Exercises to do after Monday
It is assumed that you have downloaded the three files: ex1.dat, ex1.acc and orphans.sp before starting.
- List the files in the current directory.
- Copy ex1.acc to myfile.acc.
- View the contents of both files to ensure they are identical.
- Copy ex1.dat to myfile.acc (overwriting it).
- Check that the contents of myfile.acc have changed.
- Delete myfile.acc.
- Create a directory called test and move the three files into it.
- Create a directory data and move the three files into data instead.
- Remove the now-empty test directory.
- Change directory to data and confirm that the three files are there. Return to your home/work directory afterwards.
- Create three new directories, each inside the previous one, like Russian dolls: newtest/one/two.
- Move the data directory into the innermost two directory.
- Confirm that the three files moved along with the data directory.
- Copy the three files from data back into your home directory.
- Remove all newtest directories and data with a single command.
Warning: Make sure you are in the correct parent directory before running a recursive remove.
- Count the lines in ex1.acc and ex1.dat.
- Concatenate ex1.acc and ex1.dat into ex1.tot. Ensure gene IDs appear first and numerical data after.
- Merge/paste ex1.acc and ex1.dat together into ex1.tot, overwriting the previous file. Verify corresponding lines appear together.
- Extract (cut) SwissProt ID and the 3rd numerical field (columns 1 and 5) from ex1.tot and save into ex1.res.
- Find the three SwissProt IDs in ex1.res with the largest numbers in column 2. Display only the IDs.
- Find the lines in orphans.sp that contain a GenBank accession number using grep. There are 85; verify this.
- How many human genes with SwissProt IDs exist in orphans.sp? How many are hypothetical? (11)
- How many rat genes exist, and how many of those are precursors? (9)
- From ex1.res, send positive numbers to ex1.pos and negative numbers to ex1.neg.
- Calculate ((356+51)*123-12765)/56 directly on the command line.
Exercises to do after Thursday
- Repeat exercises 2 and 3, but this time use diff to check differences.
- In myfile.acc (a copy of ex1.acc), change all occurrences of SPC to BLNK using sed. Verify changes using diff.
- Find the path to the bash executable.
- Write a bash script that solves exercises 19–24, with each exercise clearly separated and explained in both the script and the output.
- Write a bash script that puts all positive numbers from ex1.dat into ex1.pos2 and negative numbers into ex1.neg2. The script must clean up temporary files and include a header with the date and a short description.
- Write a bash script that calculates the total number of lines for all files in a directory given as a command-line argument. No argument means the current directory. Note: cut can split on symbols other than tabs.