Python object model: Difference between revisions
Jump to navigation
Jump to search
Line 22: | Line 22: | ||
Inability to understand or do this will make you fail the exam, so it is worth spending some time on the process. | Inability to understand or do this will make you fail the exam, so it is worth spending some time on the process. | ||
# In the file ''test1.dat'' is results from an experiment where every line is in the form:<br>''AccessionNumber Number Number Number ....''<br>In the files ''test2.dat'' and ''test3.dat'' are results from similar experiments but with a slightly different gene set. You want to average the numbers from all experiments for each accession number. The output this therefore on the form:<br>''AccessionNumber SingleAverageNumberOfAll3Experiments''<br>Of course it might happen that a certain gene is only in one or two experiments and in that case you calculate the average for those. You must use a one of complex data structures to store this data, | # In the file ''test1.dat'' is results from an experiment where every line is in the form:<br>''AccessionNumber Number Number Number ....''<br>In the files ''test2.dat'' and ''test3.dat'' are results from similar experiments but with a slightly different gene set. You want to average the numbers from all experiments for each accession number. The output this therefore on the form:<br>''AccessionNumber SingleAverageNumberOfAll3Experiments''<br>Of course it might happen that a certain gene is only in one or two experiments and in that case you calculate the average for those. You must use a one of complex data structures to store this data, hint hint - a dict of lists.<br><br> | ||
# Create a program that reads a tab separated file with numbers, ''matrix.dat'', (to be understood as a matrix) and stores the numbers in a matrix (list of lists). Having read the matrix from file it should now transpose it (rows to columns and columns to rows) and in the end print the transposed matrix to the screen. The output should look like the input, not a python data structure.<br>You must construct a function like transpose(matrix), which transpose the matix without using any global variables. This can be done in two ways.<br>'''a)''' matrix = transpose(matrix)<br>This is the easiest, but momentarily most memory consuming method, you just return the transposed matrix, i.e. a new data structure.<br>'''b)''' transpose(matrix)<br>Here the matrix is transposed in-line, no returned matrix, i.e. the original data structure is changed.<br>You have implement at least one of the two ways. Hint: Make a function that prints a given matrix. That will be useful underway.<br>How do you easily check if it works? Well, transposing twice yields the original matrix. [http://en.wikipedia.org/wiki/Transpose Check out Wikipedia's entry on transposing a matrix.]<br><br> | # Create a program that reads a tab separated file with numbers, ''matrix.dat'', (to be understood as a matrix) and stores the numbers in a matrix (list of lists). Having read the matrix from file it should now transpose it (rows to columns and columns to rows) and in the end print the transposed matrix to the screen. The output should look like the input, not a python data structure.<br>You must construct a function like transpose(matrix), which transpose the matix without using any global variables. This can be done in two ways.<br>'''a)''' matrix = transpose(matrix)<br>This is the easiest, but momentarily most memory consuming method, you just return the transposed matrix, i.e. a new data structure.<br>'''b)''' transpose(matrix)<br>Here the matrix is transposed in-line, no returned matrix, i.e. the original data structure is changed.<br>You have implement at least one of the two ways. Hint: Make a function that prints a given matrix. That will be useful underway.<br>How do you easily check if it works? Well, transposing twice yields the original matrix. [http://en.wikipedia.org/wiki/Transpose Check out Wikipedia's entry on transposing a matrix.]<br><br> | ||
# Study the file ''dna-array.dat'' a bit. This is real DNA array data taken from a number of persons, some controls and some suffering from colon cancer. If you look at the second line there is a lot of 0 and 1. A '0' means that values in that column are from a cancer patient and a '1' means data are from a control (healthy person). The data are all log(intensity), i.e. the logarithm of the measured intensity of the relevant spot on the dna-chip. The data in this file will be used in coming exercises. The data/columns are tab separated. The second item on each line is the accession number for that particular gene.<br>Now make a program that extracts data from ''dna-array.dat''. It shall ask for an accession number (unless you have given it on the command line). Make sure your program handles both situations. Then it shall search in the file for the data concerning that accession number. If it does not find it (you gave a wrong accession no), complain and stop. Otherwise it shall display the data in two tab separated columns. First column shall be the data from the cancer patients, second column for the controls. There are not the same number of sick and healthy people - be able to handle that.<br><br> | # Study the file ''dna-array.dat'' a bit. This is real DNA array data taken from a number of persons, some controls and some suffering from colon cancer. If you look at the second line there is a lot of 0 and 1. A '0' means that values in that column are from a cancer patient and a '1' means data are from a control (healthy person). The data are all log(intensity), i.e. the logarithm of the measured intensity of the relevant spot on the dna-chip. The data in this file will be used in coming exercises. The data/columns are tab separated. The second item on each line is the accession number for that particular gene.<br>Now make a program that extracts data from ''dna-array.dat''. It shall ask for an accession number (unless you have given it on the command line). Make sure your program handles both situations. Then it shall search in the file for the data concerning that accession number. If it does not find it (you gave a wrong accession no), complain and stop. Otherwise it shall display the data in two tab separated columns. First column shall be the data from the cancer patients, second column for the controls. There are not the same number of sick and healthy people - be able to handle that.<br><br> |
Revision as of 16:08, 3 October 2025
Previous: Regular expressions | Next: Programme |
Required course material for the lesson
Powerpoint: Object model and complex data
Resource: Example code - Complex data
Subjects covered
- Python objects
- Identity
- Mutable vs immutable
- Complex data
- Exam format
Exercises to be handed in
Important - read this before starting
This exercise set will be similar to the format of the exam. The content will obviously be different.
You have to download this python file. It contains some frame work code, but mostly some unfinished functions.
Each exercise is about finishing one of the functions in the file. You can write the function directly in the python file, or use VScode or other editor to write it, but then it has to be copied over to the python file. You must hand in the finished python file, not a .ipynb file.
Inability to understand or do this will make you fail the exam, so it is worth spending some time on the process.
- In the file test1.dat is results from an experiment where every line is in the form:
AccessionNumber Number Number Number ....
In the files test2.dat and test3.dat are results from similar experiments but with a slightly different gene set. You want to average the numbers from all experiments for each accession number. The output this therefore on the form:
AccessionNumber SingleAverageNumberOfAll3Experiments
Of course it might happen that a certain gene is only in one or two experiments and in that case you calculate the average for those. You must use a one of complex data structures to store this data, hint hint - a dict of lists. - Create a program that reads a tab separated file with numbers, matrix.dat, (to be understood as a matrix) and stores the numbers in a matrix (list of lists). Having read the matrix from file it should now transpose it (rows to columns and columns to rows) and in the end print the transposed matrix to the screen. The output should look like the input, not a python data structure.
You must construct a function like transpose(matrix), which transpose the matix without using any global variables. This can be done in two ways.
a) matrix = transpose(matrix)
This is the easiest, but momentarily most memory consuming method, you just return the transposed matrix, i.e. a new data structure.
b) transpose(matrix)
Here the matrix is transposed in-line, no returned matrix, i.e. the original data structure is changed.
You have implement at least one of the two ways. Hint: Make a function that prints a given matrix. That will be useful underway.
How do you easily check if it works? Well, transposing twice yields the original matrix. Check out Wikipedia's entry on transposing a matrix. - Study the file dna-array.dat a bit. This is real DNA array data taken from a number of persons, some controls and some suffering from colon cancer. If you look at the second line there is a lot of 0 and 1. A '0' means that values in that column are from a cancer patient and a '1' means data are from a control (healthy person). The data are all log(intensity), i.e. the logarithm of the measured intensity of the relevant spot on the dna-chip. The data in this file will be used in coming exercises. The data/columns are tab separated. The second item on each line is the accession number for that particular gene.
Now make a program that extracts data from dna-array.dat. It shall ask for an accession number (unless you have given it on the command line). Make sure your program handles both situations. Then it shall search in the file for the data concerning that accession number. If it does not find it (you gave a wrong accession no), complain and stop. Otherwise it shall display the data in two tab separated columns. First column shall be the data from the cancer patients, second column for the controls. There are not the same number of sick and healthy people - be able to handle that. - The numbers in the input file dna-array.dat should be normalized between 0 and 1 for each line with an accession number, i.e. normalization only for the individual line - not across the data set. Write the result out in the file dna-array-norm.dat, but NOT the control lines, i.e. lines where the annotation says 'control'. The resulting file will be similar to the original, but control lines are removed and the numbers are different. The problem can (and should) be solved one line at a time.
- Read the file dna-array-norm.dat and transform all the numbers less than 0.5 to 0, and numbers at 0.5 or more to 1. Now for each line/accession calculate the average of the control group numbers and the cancer group numbers. If the two averages are more than 0.4 from each other, this is considered significant and the accession should be printed along with a message up or down if it is an up regulation or a down regulation of the cancer group compared to the control.