Unix answers
1. Use a text editor to (nedit/gedit/komodo/textwrangler) to create a file mycommands.txt where you write all commands and observations you do in the following exercises. Use copy/paste to copy the commands. Note: There are more standard text editors than nedit, etc. Examples are emacs, xemacs, vi, vim, and pico. Make sure that we can easily see which exercise you attempt to solve.
2. First list the files in the directory.
ls
3. Copy ex1.acc to myfile.acc.
cp ex1.acc myfile.acc
4. Look at the content of both files to ensure they are identical.
cat ex1.acc cat myfile.acc paste ex1.acc myfile.acc diff ex1.acc myfile.acc md5sum ex1.acc myfile.acc sha256sum ex1.acc myfile.acc
5. Copy ex1.dat to myfile.acc.
cp ex1.dat myfile.acc
6. Check that the content of myfile.acc changed.
same as above
7. Delete myfile.acc.
rm myfile.acc
8. Make a directory test and move the three files to it.
mkdir test mv * test/ #or mv ex1.acc test/ mv ex1.dat test/ mv orphans.sp test/
9. Make a directory data and move the three files to that instead.
mkdir data/ mv test/* data/
10. Remove test directory.
rmdir test/
11. Change directory to data and confirm that you succeded. Go back to the home directory or work directory afterwards.
cd data/ pwd cd - #or cd ~
12. Make three new directories newtest - one inside the other, like a russian doll.
mkdir newtest cd newtest mkdir newtest cd newtest mkdir newtest cd newtest #to visualize: pwd
13. Move the data directory to the innermost newtest directory.
cd .. cd .. cd .. #or cd ../../.. mv data/ newtest/newtest/newtest/
14. Confirm that the three files are moved along with the data directory.
ls newtest/newtest/newtest/data/
15. Copy the three files to your home (your top directory).
cp newtest/newtest/newtest/data/* .
16. Remove all newtest directories and data in the with a single command.
rm -vr newtest/ #v for verbose, fun to see what happens #r for recursive
17. Count the lines in ex1.acc and ex1.dat.
wc -l ex1.* #or wc -l ex1.acc wc -l ex1.dat
18. Concatenate ex1.acc and ex1.dat in the file ex1.tot, i.e. copy the content of two files into one new file. Verify that all gene IDs comes first followed by numerical data.
cat ex1.acc ex1.dat > ex1.tot
19. Merge/Paste ex1.acc and ex1.dat together in ex1.tot, thus destroying the old file. Verify that corresponding gene IDs and numerical data are put on the same line. as the data.
paste ex1.acc ex1.dat > ex1.tot head ex1.acc head ex1.dat head ex1.tot tail ex1.acc tail ex1.dat tail ex1.tot
Note: Some versions of MobaXterm has an unfortunate bug in the command neded. You still need to do the exercise but you can get the right result here for your use in the following exercises.
20. Extract (cut) SwissProt ID and 3nd numerical data (column 1 and 5) from ex1.tot. Put results into a file ex1.res.
cut -f 1,5 ex1.tot > ex1.res
21. Find the 3 SwisProt ID's in ex1.res which have the largest number(s) in column 2, i.e. the top 3 entries.
sort -k2gr,2 ex1.res|head -3 #or sort -k2nr,2 ex1.res|head -3
- be wary of the difference between -g and -n https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html
21. Find the lines (using grep) in orphans.sp which contain a GenBank accession number. There are 85, verify this. Note: An accession number is one or two capital letters and looks like this 'AB000114.CDS.1', the .CDS. part is kind of optional.
grep -c -E "[A-Z]{2}[0-9]{5,6}" orphans.sp #or grep -c "[A-Z][A-Z][0-9][0-9][0-9][0-9][0-9]" orphans.sp #or grep "[A-Z][A-Z][0-9][0-9][0-9][0-9][0-9]" orphans.sp|wc -l
22. How many human genes with SwissProt IDs in orphans.sp exist ? How many of those are hypothetical ? (11) Note: A Swissprot ID looks like 'PARG_HUMAN' or 'TF1A_MOUSE', with the gene being before the underscore and the organism after the underscore.
grep -c "_HUMAN" orphans.sp #207 grep "_HUMAN" orphans.sp|grep -c HYPOTHETICAL #11
23. How many genes belong to the rat, and how many of those are precursors ?
grep "_RAT" orphans.sp|wc -l #51 grep "_RAT" orphans.sp |grep PRECURSOR |wc -l #9
24. From the file ex1.res find the lines with positive numbers and put then into ex1.pos. The lines with negative number go into ex1.neg.
cat ex1.res |grep "-" > ex1.neg cat ex1.res |grep -v "-" > ex1.pos
25. Write a shell script that solves exercise 19-24, with the exercises clearly separated in both the script and the output. The output should be explained. "42" is unclear, but "Number of genes: 42" is clear. This should be straight forward (but long), especially since you took notes (exercise 1).
26. Write a shell script (which is simply just a list of unix commands in a file) that puts all the positive numbers in the file ex1.dat into a file ex1.pos2, and all the negative numbers into a file ex1.neg2. Column position does not matter. The script must clean up after itself, so if any temporary files are used, they must be deleted as the last action. Remember to put the date and a description of the files in the first lines of the resulting output files.