Comprehension, Generators, Functions and Methods
Previous: Advanced Data Structures and New Data Types | Next: Classes |
Required course material for the lesson
Powerpoint: Comprehension, Generators, Functions and Methods
Video: Comprehension Monday
Video: Generators Monday
Video: Iteration in detail, use of lambda function, libraries Monday
Video: How to parse bio files with many entries
Resource: Example code - Comprehension
Resource: Example code - Misc
Video: Live Coding 2
Subjects covered
Comprehension, which is a way of manipulation/selecting data with a hidden loop.
Lambda, the small anonymous function.
Generators, which is like a function with memory of previous calls.
More theoretical iteration.
New functions and methods.
Exercises to be handed in
- Make a program that calculates the product of two matrices and prints it on STDOUT (the screen). The matrices are in the files mat1.dat and mat2.dat. Numbers in the files are tab separated. A matrix should be stored as an list of lists.
Advice: The program should have a function that reads a matrix from a given file (to be used twice), a function that calculates the product, and a function that prints a matrix. This way ensures that your program is easy to change to other forms of matrix calculations. Here are two links to the definition of matrix multiplication.
Math is Fun
Math world - The purpose of this exercise is to find the 10 genes that has the biggest difference in expression between cancer and control patients in the dna-array.dat file after a linear transformation of the numbers in the columns. In order to not start from the beginning, use the file dna-array-norm.dat created in exercise 4 in Advanced Data Structures and New Data Types as input. The other tab-separated input file lineartransform.dat has an A (slope) and a B (intersection) - one AB pair for each number column in the dna-array-norm.dat file. For each line in dna-array-norm.dat you first linear transform the numbers according to the A & B in lineartransform.dat - first number uses first AB pair, second number uses second AB pair, and so forth. If your number is X, then the transformed number is A*X+B. When the entire line is transformed, you calculate the average of the cancer patients and the average of the controls. From that, find the 10 genes with the biggest difference in expression. There are a number of ways, but a simple one is to create a list of tuples with every tuple consisting of (gene name, cancer average, control average), and then sort the list according to the difference in cancer and control average. Using a lambda function when sorting springs to mind. Display the top 10 in the sorted list.
- Make a moving average generator: moving_avg(List_of_numbers, Window_size). The generator calculates the average number in a window moving across the list. Try it on the numbers in ex1.dat, i.e. load the numbers column-based into a single list first, i.e. first all the numbers in column 1, then the numbers in column 2, and so forth in the list.
- Make a trend discoverer generator: trend(List_of_numbers). It looks at a list of numbers in a moving window way and emits 1, if the next number is higher than the previous, and 0 otherwise. Any longer sequence of 0's or 1's in the generator output is a trend in the data. Check with ex1.dat (load same way a previous exercise) or another file of your choosing.
- Changing the previous exercise: Make a find_trend(List_of_numbers, Minimum_trend_size) generator, which return a tuple (Position_Start, Size, Direction) of where and how big the trends in List_of_numbers are. Direction is 0 or 1 as you want to know which direction the trend is going. Position_start is the position in the (zero-based) list, where the trend starts. Size is how long the trend of ascending/descending numbers is. This is surprisingly difficult. Test with a simple file of your own making to check your results.
- Make a generator combinations(), that takes a list of strings as input, e.g. combinations(["GAVIL", "ST", "NQ", "FWY", "D", "HKR"]), and generates all possible combinations. A combination is formed by choosing 1 letter from the first string, 1 letter from the second string, and so forth, in that order, until a letter from all strings is chosen. The input list can have any number of strings and the strings can have any length (greater than 0). There must be NO REPEATS - random is not an acceptable library to use. As is obvious, the example has 5*2*2*3*1*3 = 180 different combinations, the first being GSNFDH. Print them all on the screen. If your input is ['0123456789', '0123456789', '0123456789'], then you will print the numbers from 000 to 999. Hint: A list of counters, 1 per string, could be useful in iterating through the combinations.
When can such a generator be useful? If you want to generate a list of antigens, which needs certain amino acids to be in certain positions.