Unix answers

1. Use a text editor to (nedit/gedit/komodo/textwrangler) to create a file mycommands.txt where you write all commands and observations you do in the following exercises. Use copy/paste to copy the commands. Note: There are more standard text editors than nedit, etc. Examples are emacs, xemacs, vi, vim, and pico. Make sure that we can easily see which exercise you attempt to solve.

2. First list the files in the directory.

ls

3. Copy ex1.acc to myfile.acc.

cp ex1.acc myfile.acc

4. Look at the content of both files to ensure they are identical.

cat ex1.acc 
cat myfile.acc 
paste  ex1.acc myfile.acc 
diff  ex1.acc myfile.acc 
md5sum  ex1.acc myfile.acc 
sha256sum  ex1.acc myfile.acc

5. Copy ex1.dat to myfile.acc.

cp ex1.dat myfile.acc

6. Check that the content of myfile.acc changed.

same as above

7. Delete myfile.acc.

rm myfile.acc

8. Make a directory test and move the three files to it.

mkdir test

mv * test/
#or
mv ex1.acc test/
mv ex1.dat test/
mv orphans.sp test/

9. Make a directory data and move the three files to that instead.

mkdir data/
mv test/* data/

10. Remove test directory.

rmdir test/

11. Change directory to data and confirm that you succeded. Go back to the home directory or work directory afterwards.

cd data/
pwd

cd -
#or
cd ~

12. Make three new directories newtest - one inside the other, like a russian doll.

mkdir newtest
cd newtest
mkdir newtest
cd newtest
mkdir newtest
cd newtest
#to visualize:
pwd

13. Move the data directory to the innermost newtest directory.

cd ..
cd ..
cd ..

#or
cd ../../..

mv data/ newtest/newtest/newtest/

14. Confirm that the three files are moved along with the data directory.

ls newtest/newtest/newtest/data/

15. Copy the three files to your home (your top directory).

cp newtest/newtest/newtest/data/* .

16. Remove all newtest directories and data in the with a single command.

rm -vr newtest/
#v for verbose, fun to see what happens
#r for recursive

17. Count the lines in ex1.acc and ex1.dat.

wc -l ex1.*
#or
wc -l ex1.acc
wc -l ex1.dat

18. Concatenate ex1.acc and ex1.dat in the file ex1.tot, i.e. copy the content of two files into one new file. Verify that all gene IDs comes first followed by numerical data.

cat ex1.acc ex1.dat > ex1.tot

19. Merge/Paste ex1.acc and ex1.dat together in ex1.tot, thus destroying the old file. Verify that corresponding gene IDs and numerical data are put on the same line. as the data.

paste ex1.acc ex1.dat > ex1.tot

head ex1.acc
head ex1.dat
head ex1.tot


tail ex1.acc
tail ex1.dat
tail ex1.tot

Note: Some versions of MobaXterm has an unfortunate bug in the command neded. You still need to do the exercise but you can get the right result here for your use in the following exercises.

20. Extract (cut) SwissProt ID and 3nd numerical data (column 1 and 5) from ex1.tot. Put results into a file ex1.res.

cut -f 1,5 ex1.tot > ex1.res

21. Find the 3 SwisProt ID's in ex1.res which have the largest number(s) in column 2, i.e. the top 3 entries.

sort -k2gr,2 ex1.res|head -3
#or 
sort -k2nr,2 ex1.res|head -3

be wary of the difference between -g and -n https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html

21. Find the lines (using grep) in orphans.sp which contain a GenBank accession number. There are 85, verify this. Note: An accession number is one or two capital letters and looks like this 'AB000114.CDS.1', the .CDS. part is kind of optional.

grep -c -E  "[A-Z]{2}[0-9]{5,6}" orphans.sp
#or
grep -c     "[A-Z][A-Z][0-9][0-9][0-9][0-9][0-9]" orphans.sp
#or
grep      "[A-Z][A-Z][0-9][0-9][0-9][0-9][0-9]" orphans.sp|wc -l

22. How many human genes with SwissProt IDs in orphans.sp exist ? How many of those are hypothetical ? (11) Note: A Swissprot ID looks like 'PARG_HUMAN' or 'TF1A_MOUSE', with the gene being before the underscore and the organism after the underscore.

grep -c  "_HUMAN" orphans.sp
#207
grep   "_HUMAN" orphans.sp|grep -c HYPOTHETICAL 
#11

23. How many genes belong to the rat, and how many of those are precursors ?

grep    "_RAT" orphans.sp|wc -l 
#51
grep    "_RAT" orphans.sp |grep PRECURSOR |wc -l 
#9

24. From the file ex1.res find the lines with positive numbers and put then into ex1.pos. The lines with negative number go into ex1.neg.

cat ex1.res  |grep "-" > ex1.neg
cat ex1.res  |grep -v "-" > ex1.pos

25. Write a shell script that solves exercise 19-24, with the exercises clearly separated in both the script and the output. The output should be explained. "42" is unclear, but "Number of genes: 42" is clear. This should be straight forward (but long), especially since you took notes (exercise 1).

26. Write a shell script (which is simply just a list of unix commands in a file) that puts all the positive numbers in the file ex1.dat into a file ex1.pos2, and all the negative numbers into a file ex1.neg2. Column position does not matter. The script must clean up after itself, so if any temporary files are used, they must be deleted as the last action. Remember to put the date and a description of the files in the first lines of the resulting output files.

Unix answers

Navigation menu

Search