In this exercise you shall write a few simple python programs to get introducted to some of the essential functions used in the programs of the course.
If you have NOT done the Unix/Linux recap exercise, you need to first download some data files and setup your course directory. Open a terminal window under Linux. Make a course directory (say Algo) some place. We will next do all the course exercises and store all course related files in this directory. Now, download the file
Open the file (using tar -xzvf data.tar.gz) and place the created data directory in the "Algo" directory.
Now create a directory called "code" in the course directory (i.e. Algo or whatever name you have selected).
Download the file Intro.tar.gz file and place it in the code directory. Remove the Intro directory if present (rm -rf Intro). Open the Intro.tar.gz file using tar -xzvf Intro.tar.gz
Now go to the Intro directory. Here you will find two python jupyter-notebook files
Python_Intro.ipynb seq2scoremat.ipynb
Open the file Python_Intro.ipynb with jupyter-notebook, and respond to each of the exercises described.
When complited, download the notebook as python code, and add a commandline parser to allow the program to accept three options
-t THRESHOLD Target value filtering threshold (default: 0.5) -f PEPTIDES_TARGETS_FILE Peptides-Targets file -l PEPLEN_THRESHOLD Peptide length (default: 9)
Next, you shall make a small program to plot the scoring matrix between two protein sequences. The program seq2scoremat.ipynb contains an almost complete implementation of this.
The program reads two sequence files, a BLOSUM substitution scoring matrix, and next calculates the amino acids scoring matrix between the two sequences.
Open the program with jupyter-notebook. You shall fill in the blanked out parts of the program so that you get a figure of the scoring matrix identical to the one included here
This is all for now