# A very first Python program

In this program we will:

1.   Load a peptide-target list using NumPy.
2.   Keep only 9mer peptides.
3.   Discard peptides using a target value threshold.
4.   Print the filtered result.
5.   Download the notebook as a Python file (.py).
6.   Implement a command line parser in the .py file.

In the code below, some parts have been blanked out with XX. Fill in the codee these places.
Note that if you are using python2 you might want to remove the "()" parentheses in the print commands

## Import NumPy

In [1]:
import numpy as np

## DEFINE THE PATH TO YOUR COURSE DATA DIRECTORY

In [2]:
data_dir = "/Users/mniel/Courses/Algorithms_in_Bioinf/ipython/data/"

## Load peptides-targets data

**Specify file name **

In [3]:
peptides_targets_file = data_dir + "Intro/test.dat"

**Load the file with the numpy text parser *np.loadtxt* and reshape it into a numpy array of shape (-1, 2)** 

*This means "give me all the rows you want, but I demand only 2 columns", and ensures the [PEPTIDE, TARGET_VALUE] structure we want*

In [4]:
##peptides_targets = XX
peptides_targets = np.loadtxt(peptides_targets_file, dtype=str).reshape(-1,2)

**Check the shape of your newly created numpy array**

In [5]:
print(peptides_targets.shape)

(86, 2)


## Store peptides in vector

**Fish out the peptides from the loaded file using array indexing and slicing**

In [6]:
peptides = peptides_targets[:, 0]

**Check that *peptides* has the same data type as *peptides_targets***

In [7]:
print(type(peptides), type(peptides_targets))

(<type 'numpy.ndarray'>, <type 'numpy.ndarray'>)


## Store targets in vector

**Fish out the target values from the loaded file using array indexing and slicing**

*Remember that we used a text parser to load the data. So, we need to cast this value to float somehow*

In [8]:
##targets = XXX
targets = peptides_targets[:, 1].astype(float)

## Keep 9mers only

**Declare two Python lists to store peptides and targets**


In [9]:
peptides_9mer = []
targets_9mer = []

**Iterate over the elements of the peptides list and keep peptides with length == 9**

In [10]:
for i in range(0, len(peptides)):
    
    if len(peptides[i]) == 9:
        
        ##peptides_9mer.XX
        
        ##targets_9mer.XX
        peptides_9mer.append(peptides[i])
        
        targets_9mer.append(targets[i])

## Remove peptides with target value < threshold

**Declare a threshold variable**

In [11]:
threshold = 0.5

**Declare python list to store the indexes of the elements to be removed**

In [12]:
to_remove = []

**Iterate over the 9mer peptides, check which target values < threshold, and store the indexes in the previously declared array**

In [13]:
for i in range(0, len(peptides_9mer)):
        
        if targets_9mer[i] < threshold:

            ##to_remove.XX
            to_remove.append(i)

**Use the *delete* NumPy function to remove the peptides**

In [14]:
peptides_9mer_t = np.delete(peptides_9mer, to_remove)
targets_9mer_t = np.delete(targets_9mer, to_remove)

**Check that no elements with target < threshold are present in the target values array**

In [15]:
error = False

for i in range(0, len(peptides_9mer_t)):
        
        if targets_9mer_t[i] < threshold:

            error = True
            
            break

if error:

    print("Something went wrong")
    
else:
    
    print("Success")

Success


## Print the final, filtered peptide-target pairs

**Ensure that this output is consistent with the data filtering steps you have made!**

In [16]:
for i in range(0, len(peptides_9mer_t)):
    
    print peptides_9mer_t[i], targets_9mer_t[i]

ILYQVPFSV 0.8532
VVMGTLVAL 0.5891
KILSVFFLA 0.8512
HLYQGCQVV 0.5386
YLDLALMSV 0.8425
ALAKAAAAA 0.5631
MALLRLPLV 0.6337
FLLTRILTI 0.8027
ILSSLGLPV 0.6384
RMYGVLPWI 0.6889
YLEPGPVTV 0.6472
FLPWHRLFL 0.5637
LLPSLFLLL 0.5537
MLQDMAILT 0.5269
GLMTAVYLV 0.798
GLYSSTVPV 0.6972
SLYFGGICV 0.7819
GLYYLTTEV 0.7195
ALYGALLLA 0.8176
IMPGQEAGL 0.6144
WLSLLVPFV 0.8221
YLVAYQATV 0.6391
WLDQVPFSV 0.7742
KTWGQYWQV 0.778
GLLGWSPQA 0.7929
YMLDLQPET 0.6538
HLAVIGALL 0.5714
MMWYWGPSL 0.7704
FLLRWEQEI 0.7004
IIDQVPFSV 0.6591
SVYVDAKLV 0.5725
RLLDDTPEV 0.578
IAATYNFAV 0.5812
YLVSFGVWI 0.9406
ILLLCLIFL 0.5414
LLLCLIFLL 0.6989
GLQDCTMLV 0.7101
FTDQVPFSV 0.6195
YLAPGPVTA 0.794
GLLGNVSTV 0.7063
GTLGIVCPI 0.5033
YLEPGPVTI 0.6142
LLFLGVVFL 0.6384
SLAGFVRML 0.5646
GLYLSQIAV 0.578
KLTPLCVTL 0.5725
YLYPGPVTA 0.7387
TVLRFVPPL 0.5986
ILSPFMPLL 0.6482
FVWLHYYSV 0.7491
ILDQVPFSV 0.6348
ILYQVPFSV 0.8532
VVMGTLVAL 0.5891
KILSVFFLA 0.8512
HLYQGCQVV 0.5386
YLDLALMSV 0.8425
ALAKAAAAA 0.5631
MALLRLPLV 0.6337
FLLTRILTI 0.8027
IL

# Now download this as a Python file (File -> Save as .py), and continue reading offline

## Adding a command line parser

In [0]:
################################
# ADDING A COMMAND LINE PARSER #
################################

# For this step, we need first to import an argument parser
# to do this, add the following line just below the numpy import:

# from argparse import ArgumentParser

# We will now create an argument parser that will receive as arguments two values
# 1) the peptides-targets file to open (-f option)
# 2) the threshold to be applied in the target value filtering step (-t option)
# To achieve this, add the following lines below the ArgumentParser import line:


# parser = ArgumentParser(description="A very first Python program")
# parser.add_argument("-t", action="store", dest="threshold", type=float, default=0.5, help="Target value filtering threshold (default: 0.5)")
# parser.add_argument("-f", action="store", dest="peptides_targets_file", type=str, help="Peptides-Targets file")
# args = parser.parse_args()
# threshold = args.threshold
# peptides_targets_file = args.peptides_targets_file


# After adding these lines, you will now be able to call this python program 
# from the terminal while specifying these arguments:


# python Python_Intro.py -t some_threshold -f file_with_peptides_and_targets

# Note you can also parse switches with the ArgumentParser, i.e 
# parser.add_argument('-w', action='store_true', default=False, dest='sequence_weighting', help='Use sequence weighting')


# REMEMBER!
# 1) The argument parser needs to be declared on the beginning of the script, right after the imports
# 2) In order for this program to work properly after adding the parser, you must now comment or delete 
#    the previous declarations of the variables "threshold" and "peptides_target_file"