Python Recap and Objects
Previous: Unix | Next: Regular Expressions |
Required course material for the lesson
Powerpoint: Python Recap and Objects
Powerpoint: Random numbers
Subjects covered
- A short essential recap of Python learned course 22101.
- F-string formatting
- Command line arguments with Python
- The Python Object Model, and how it influences Python.
- Identity versus Equality
Exercises to be handed in
In exercises 2-9 the job is to select the lines in the input file, i.e. the exercises are about which lines to select in various ways. The output is just a few of the lines in the inputfile.
- Make a handy little calculator, calc.py, which takes 2 numbers and a operation from command line and displays the result: Here is an example of how it works:
./calc.py 3 + 6
The output should just be the result. You need to have at least the basic 4 operators working (+,-,/,*). If you want to, you can extend it to have more numbers than 2, like./calc.py 5 + 6 * 4
An issue with operator precedence might appear. - The input file scores.txt is a tab-separated file with an accession number in first column followed by 6 numbers (scores) between 0 and 1. You must find the accession numbers and scores (that means the entire line) of the 10 highest and 10 lowest "combined scores" (combined score is the metric for selection) and save the output in the file scoresextreme.txt.
The combined score is simply the 6 numbers added together. The order of the output must be from high to low. Take the name of input file and the output file from the command line, so the program is flexible. - Change exercise 2 in the following way: There is an input file, negative_list.txt, which is a list of genes which can NOT be part of the output. They are banned from your analysis. As can be seen, the genes are identified by their swissprot id. In order to translate from swissprot id to accession number so you can relate it to the scores.txt, you must use the input file translation.txt, where the first item on the line is a accession number, second item is the corresponding swissprot id.
- Change exercise 2 in the following way: Make the program work no matter how many numbers there are on every line. It must be the same number of numbers, i.e. in one file it could be 10 numbers on every line, in another file it could be 7 numbers per line.
- Change exercise 2 in the following way: Instead of using the combined score as the metric for selecting the accession numbers and scores, then use the average score as the metric. That will allow for having a different number of numbers on each line in the input file.
- Change exercise 2 in the following way: When you calculate the combined score the first number should weigh 50% more than the other numbers and the last should weigh 50% less.
- Change exercise 2 in the following way: When you calculate the combined score the numbers should be weighted after a linear sliding scale with the first number count for 50% more than its real value, sliding linearly down to the last number which is weighted 50% less. The weight is thus dynamically calculated according to how many numbers there are on the line and which position the number is on the line.
N is the number of numbers, P is the position of the number, then the weight W is calculated as: W = 1.5 - (P-1)/(N-1)
A more generic expression for the sliding weighting scale - B is the beginning weight, E is the ending weight: W = B - (B-E)*(P-1)/(N-1) - Change exercise 2 in the following way: Just find the 10 lines in the input file with the highest combined scores. This should be easier than the original exercise.
- This is the same exercise as the previous (ex 8), however imagine that the input file is enormous - so big that you can not have it in memory. You still need to solve the problem, and this is done by having a running list of the best scores.