<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?action=history&amp;feed=atom&amp;title=Dict_techniques</id>
	<title>Dict techniques - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?action=history&amp;feed=atom&amp;title=Dict_techniques"/>
	<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;action=history"/>
	<updated>2026-05-09T07:08:47Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=267&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=267&amp;oldid=prev"/>
		<updated>2025-10-03T13:58:35Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 15:58, 3 October 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l19&quot;&gt;Line 19:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 19:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Create a dictionary where the keys are codons and the value are the one-letter-code for the amino acids. The dictionary will function as a look-up table. You can find a [[codon list]] here. You are meant to make the dict &quot;by hand&quot; as there is a structure lesson in that. Add a bit of smart code that tests if you made the dict right.&amp;lt;br&amp;gt; Extra: If you feel like it you can in addition make a program that constructs the dict from a file, which you are responsible for making.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Create a dictionary where the keys are codons and the value are the one-letter-code for the amino acids. The dictionary will function as a look-up table. You can find a [[codon list]] here. You are meant to make the dict &quot;by hand&quot; as there is a structure lesson in that. Add a bit of smart code that tests if you made the dict right.&amp;lt;br&amp;gt; Extra: If you feel like it you can in addition make a program that constructs the dict from a file, which you are responsible for making.&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Use the dictionary from the previous exercise and your previous functions &#039;&#039;&#039;fastaread()&#039;&#039;&#039; and &#039;&#039;&#039;fastawrite()&#039;&#039;&#039; in a program, that translates all the nucleotide fasta entries in &#039;&#039;dna7.fsa&#039;&#039; to amino acid sequence. Save the results in a file &#039;&#039;aa7.fsa&#039;&#039; in fasta format. Since the sequences are now consisting of amino acids add &#039;Amino Acid Sequence&#039; to each header. The STOP codon is NOT a part of the amino acid sequence. Think about what STOP means.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Use the dictionary from the previous exercise and your previous functions &#039;&#039;&#039;fastaread()&#039;&#039;&#039; and &#039;&#039;&#039;fastawrite()&#039;&#039;&#039; in a program, that translates all the nucleotide fasta entries in &#039;&#039;dna7.fsa&#039;&#039; to amino acid sequence. Save the results in a file &#039;&#039;aa7.fsa&#039;&#039; in fasta format. Since the sequences are now consisting of amino acids add &#039;Amino Acid Sequence&#039; to each header. The STOP codon is NOT a part of the amino acid sequence. Think about what STOP means.&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# In the file &#039;&#039;ex5.acc&#039;&#039; are a lot of accession numbers, where some are duplicates. Earlier we just removed the duplicates, but now we should count them. Make a program that reads the file once, and writes a file &#039;&#039;order5.acc&#039;&#039; with the unique accession numbers and the number of occurrences in the file. A line should look like this: &quot;AC24677 2&quot;, if this accession occurs twice in &#039;&#039;ex5.acc&#039;&#039;. The accession numbers must be written in order, which means the accession with most duplicates is on top (the beginning) and the least on bottom. If two accessions have the same amount of duplicates, they need to be ordered according to the accession name, i.e. AC543322 is before BG001110.&amp;lt;br&amp;gt;Note: This is quite a tricky exercise. If you are absolutely stuck, then at least order the accessions by the number of duplicates and hand in.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# In the file &#039;&#039;ex5.acc&#039;&#039; are a lot of accession numbers, where some are duplicates. Earlier we just removed the duplicates, but now we should count them. Make a program that reads the file once, and writes a file &#039;&#039;order5.acc&#039;&#039; with the unique accession numbers and the number of occurrences in the file. A line should look like this: &quot;AC24677 2&quot;, if this accession occurs twice in &#039;&#039;ex5.acc&#039;&#039;. The accession numbers must be written in order, which means the accession with most duplicates is on top (the beginning) and the least on bottom. If two accessions have the same amount of duplicates, they need to be ordered according to the accession name, i.e. AC543322 is before BG001110.&amp;lt;br&amp;gt;Note: This is quite a tricky exercise. If you are absolutely stuck, then at least order the accessions by the number of duplicates and hand in.&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &amp;lt;font color=&quot;#AA00FF&quot;&amp;gt;In the tab-separated files &#039;&#039;slinger.txt&#039;&#039; and &#039;&#039;hoist.txt&#039;&#039; are two columns with an accession number and a numeric result; a probability between 0 and 1. The numbers are from running 2 different programs (slinger and hoist, if you are in doubt). You must combine these probabilities - basically taking the average of the two numbers - for each accession number and write the result in a file &#039;&#039;combined.txt&#039;&#039;. The file should look like the sources, i.e. tab-separated with accession in column 1 and number in column 2. Unfortunately, the two programs have not been run from the same set of accession numbers, so some of the results are only available in one of the input files. In such case you ignore/discard the data for that accession. Only save results in the output file when the accession is in both of the input files.&amp;lt;/font&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &amp;lt;font color=&quot;#AA00FF&quot;&amp;gt;In the tab-separated files &#039;&#039;slinger.txt&#039;&#039; and &#039;&#039;hoist.txt&#039;&#039; are two columns with an accession number and a numeric result; a probability between 0 and 1. The numbers are from running 2 different programs (slinger and hoist, if you are in doubt). You must combine these probabilities - basically taking the average of the two numbers - for each accession number and write the result in a file &#039;&#039;combined.txt&#039;&#039;. The file should look like the sources, i.e. tab-separated with accession in column 1 and number in column 2. Unfortunately, the two programs have not been run from the same set of accession numbers, so some of the results are only available in one of the input files. In such case you ignore/discard the data for that accession. Only save results in the output file when the accession is in both of the input files.&amp;lt;/font&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;gt;&amp;lt;br&amp;gt;&amp;lt;br&lt;/ins&gt;&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Using above method gives you too little data. You try this time to combine your two input sets differently. If an accession is in both input files you use the average, if it is in only one, you just use the number straight in the output file. This is effectively making a union of the input instead of an intersection.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Using above method gives you too little data. You try this time to combine your two input sets differently. If an accession is in both input files you use the average, if it is in only one, you just use the number straight in the output file. This is effectively making a union of the input instead of an intersection.&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# In the files &#039;&#039;geneA.txt&#039;&#039;, &#039;&#039;geneB.txt&#039;&#039;, all the way down to &#039;&#039;geneE.txt&#039;&#039; you have normalized mRNA expression data taken at the time of discovery of colon cancer for a number of patients and their survival. This is basically 2 columns in each file; The mRNA expression (x) and the number of months (y) the patient survived. For each gene you have to make a [https://en.wikipedia.org/wiki/Simple_linear_regression simple linear regression] analysis and find 3 numbers; the &#039;&#039;&#039;&amp;amp;alpha;&#039;&#039;&#039; (the intercept - where the line cuts the Y-axis) and &#039;&#039;&#039;&amp;amp;beta;&#039;&#039;&#039; (the slope) coefficient that describes the line running through the data points best, and the correlation coefficient (&#039;&#039;&#039;r&#039;&#039;&#039;) which describes the fitness of the line. You must identify the gene that best indicates how long the patient survives. For every gene you start calculating these values.&amp;lt;br&amp;gt;[[File:calc.png|220px]] &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp;&#039;&#039;n&#039;&#039; = number of observations.&amp;lt;br&amp;gt;From the values you can compute the required parameters.&amp;lt;br&amp;gt;[[File:alfabeta.png|130px]] &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp;[[File:correlation.png|200px]]&amp;lt;br&amp;gt;Remember to say which gene best describes survival - and why. A survival prediction can be made by calculating &amp;amp;beta; * x + &amp;amp;alpha;, given x which is the mRNA expression.&amp;lt;br&amp;gt;Note: The genes will in reality interact with each other in ways that totally destroys our basic assumption for making a linear regression: That the data (gene expressions) are independent.&amp;lt;br&amp;gt;Make your code in a general way - there can for example be more data files. Make it easy to add them.&amp;lt;br&amp;gt;The gene with the best correlation coefficient is geneD, with a CC of 83.75%.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# In the files &#039;&#039;geneA.txt&#039;&#039;, &#039;&#039;geneB.txt&#039;&#039;, all the way down to &#039;&#039;geneE.txt&#039;&#039; you have normalized mRNA expression data taken at the time of discovery of colon cancer for a number of patients and their survival. This is basically 2 columns in each file; The mRNA expression (x) and the number of months (y) the patient survived. For each gene you have to make a [https://en.wikipedia.org/wiki/Simple_linear_regression simple linear regression] analysis and find 3 numbers; the &#039;&#039;&#039;&amp;amp;alpha;&#039;&#039;&#039; (the intercept - where the line cuts the Y-axis) and &#039;&#039;&#039;&amp;amp;beta;&#039;&#039;&#039; (the slope) coefficient that describes the line running through the data points best, and the correlation coefficient (&#039;&#039;&#039;r&#039;&#039;&#039;) which describes the fitness of the line. You must identify the gene that best indicates how long the patient survives. For every gene you start calculating these values.&amp;lt;br&amp;gt;[[File:calc.png|220px]] &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp;&#039;&#039;n&#039;&#039; = number of observations.&amp;lt;br&amp;gt;From the values you can compute the required parameters.&amp;lt;br&amp;gt;[[File:alfabeta.png|130px]] &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp;[[File:correlation.png|200px]]&amp;lt;br&amp;gt;Remember to say which gene best describes survival - and why. A survival prediction can be made by calculating &amp;amp;beta; * x + &amp;amp;alpha;, given x which is the mRNA expression.&amp;lt;br&amp;gt;Note: The genes will in reality interact with each other in ways that totally destroys our basic assumption for making a linear regression: That the data (gene expressions) are independent.&amp;lt;br&amp;gt;Make your code in a general way - there can for example be more data files. Make it easy to add them.&amp;lt;br&amp;gt;The gene with the best correlation coefficient is geneD, with a CC of 83.75%.&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Repeat the previous exercise again with a new type of data file &amp;#039;&amp;#039;gene_combined.txt&amp;#039;&amp;#039;, which is more typical in real life. All genes are in one tab separated file. There are 3 columns; gene name, normalized mRNA expression and survival in months. There is no particular order in which the data appears and data lines for several genes might be mixed within each other.&amp;lt;br&amp;gt;Again, make general code. There can be more or fewer genes, and you do not need to know there names beforehand.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Repeat the previous exercise again with a new type of data file &amp;#039;&amp;#039;gene_combined.txt&amp;#039;&amp;#039;, which is more typical in real life. All genes are in one tab separated file. There are 3 columns; gene name, normalized mRNA expression and survival in months. There is no particular order in which the data appears and data lines for several genes might be mixed within each other.&amp;lt;br&amp;gt;Again, make general code. There can be more or fewer genes, and you do not need to know there names beforehand.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=225&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises for extra practice */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=225&amp;oldid=prev"/>
		<updated>2025-09-06T15:28:55Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises for extra practice&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 17:28, 6 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l29&quot;&gt;Line 29:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 29:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* The &amp;#039;&amp;#039;geneA-E.txt&amp;#039;&amp;#039; files all have the same structure on each line; first number is a float between 0 and 1, second number is an integer (representing months of survival after discovery of the cancer). For all files (the combined data set) find the average of the float, given the integer and display in ascending order of the integer. You need to add all the floats for a given integer together and divide by the number of floats for the integer, then you have the average for the integer. To succeed at this, you must use two dicts where the integer is the key in both. The corresponding values are the sum of the floats (for that key) and the number of times the key has been encountered in the files. Hint: Unfortunately, this does not make much biological sense, but is more in the nature of a programming exercise.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* The &amp;#039;&amp;#039;geneA-E.txt&amp;#039;&amp;#039; files all have the same structure on each line; first number is a float between 0 and 1, second number is an integer (representing months of survival after discovery of the cancer). For all files (the combined data set) find the average of the float, given the integer and display in ascending order of the integer. You need to add all the floats for a given integer together and divide by the number of floats for the integer, then you have the average for the integer. To succeed at this, you must use two dicts where the integer is the key in both. The corresponding values are the sum of the floats (for that key) and the number of times the key has been encountered in the files. Hint: Unfortunately, this does not make much biological sense, but is more in the nature of a programming exercise.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* This exercise requires exercise 6 from [[Simple pattern matching]]. Modify the code a bit so you only compute what you have to. In the &#039;&#039;data1-4.gb&#039;&#039; files count &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;who &lt;/del&gt;many times the different codons in the coding sequence occurs. Display.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* This exercise requires exercise 6 from [[Simple pattern matching]]. Modify the code a bit so you only compute what you have to. In the &#039;&#039;data1-4.gb&#039;&#039; files count &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;how &lt;/ins&gt;many times the different codons in the coding sequence occurs. Display.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* This exercise builds on exercise 2 for this lesson. You must read the &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; file and translate the DNA sequences to protein sequence. Report the frequencies of the various amino acids for the entire file - all sequences (not individual sequences). That is - count how many there is of each amino acid (a total) in the translated sequences, compute the frequency of each (Number_of_this_amino_acid/Total_number_of_amino_acids) and print the results as &amp;quot;S   0.0123&amp;quot;, i.e. 4 digits after the dot. You do not need to save the fasta file.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* This exercise builds on exercise 2 for this lesson. You must read the &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; file and translate the DNA sequences to protein sequence. Report the frequencies of the various amino acids for the entire file - all sequences (not individual sequences). That is - count how many there is of each amino acid (a total) in the translated sequences, compute the frequency of each (Number_of_this_amino_acid/Total_number_of_amino_acids) and print the results as &amp;quot;S   0.0123&amp;quot;, i.e. 4 digits after the dot. You do not need to save the fasta file.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=224&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=224&amp;oldid=prev"/>
		<updated>2025-09-06T11:36:07Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 13:36, 6 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l24&quot;&gt;Line 24:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 24:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &amp;lt;font color=&amp;quot;#AA00FF&amp;quot;&amp;gt;In the tab-separated files &amp;#039;&amp;#039;slinger.txt&amp;#039;&amp;#039; and &amp;#039;&amp;#039;hoist.txt&amp;#039;&amp;#039; are two columns with an accession number and a numeric result; a probability between 0 and 1. The numbers are from running 2 different programs (slinger and hoist, if you are in doubt). You must combine these probabilities - basically taking the average of the two numbers - for each accession number and write the result in a file &amp;#039;&amp;#039;combined.txt&amp;#039;&amp;#039;. The file should look like the sources, i.e. tab-separated with accession in column 1 and number in column 2. Unfortunately, the two programs have not been run from the same set of accession numbers, so some of the results are only available in one of the input files. In such case you ignore/discard the data for that accession. Only save results in the output file when the accession is in both of the input files.&amp;lt;/font&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &amp;lt;font color=&amp;quot;#AA00FF&amp;quot;&amp;gt;In the tab-separated files &amp;#039;&amp;#039;slinger.txt&amp;#039;&amp;#039; and &amp;#039;&amp;#039;hoist.txt&amp;#039;&amp;#039; are two columns with an accession number and a numeric result; a probability between 0 and 1. The numbers are from running 2 different programs (slinger and hoist, if you are in doubt). You must combine these probabilities - basically taking the average of the two numbers - for each accession number and write the result in a file &amp;#039;&amp;#039;combined.txt&amp;#039;&amp;#039;. The file should look like the sources, i.e. tab-separated with accession in column 1 and number in column 2. Unfortunately, the two programs have not been run from the same set of accession numbers, so some of the results are only available in one of the input files. In such case you ignore/discard the data for that accession. Only save results in the output file when the accession is in both of the input files.&amp;lt;/font&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Using above method gives you too little data. You try this time to combine your two input sets differently. If an accession is in both input files you use the average, if it is in only one, you just use the number straight in the output file. This is effectively making a union of the input instead of an intersection.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Using above method gives you too little data. You try this time to combine your two input sets differently. If an accession is in both input files you use the average, if it is in only one, you just use the number straight in the output file. This is effectively making a union of the input instead of an intersection.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# In the files &#039;&#039;geneA.txt&#039;&#039;, &#039;&#039;geneB.txt&#039;&#039;, all the way down to &#039;&#039;geneE.txt&#039;&#039; you have normalized mRNA expression data taken at the time of discovery of colon cancer for a number of patients and their survival. This is basically 2 columns in each file; The mRNA expression (x) and the number of months (y) the patient survived. For each gene you have to make a [https://en.wikipedia.org/wiki/Simple_linear_regression simple linear regression] analysis and find 3 numbers; the &#039;&#039;&#039;&amp;amp;alpha;&#039;&#039;&#039; (the intercept - where the line cuts the Y-axis) and &#039;&#039;&#039;&amp;amp;beta;&#039;&#039;&#039; (the slope) coefficient that describes the line running through the data points best, and the correlation coefficient (&#039;&#039;&#039;r&#039;&#039;&#039;) which describes the fitness of the line. You must identify the gene that best indicates how long the patient survives. For every gene you start calculating these values.&amp;lt;br&amp;gt;[[File:calc.png|220px]] &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp;&#039;&#039;n&#039;&#039; = number of observations.&amp;lt;br&amp;gt;From the values you can compute the required parameters.&amp;lt;br&amp;gt;[[File:alfabeta.png|130px]] &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp;[[File:correlation.png|200px]]&amp;lt;br&amp;gt;Remember to say which gene best describes survival - and why. A survival prediction can be made by calculating &amp;amp;beta; * x + &amp;amp;alpha;, given x which is the mRNA expression.&amp;lt;br&amp;gt;Note: The genes will in reality interact with each other in ways that totally destroys our basic assumption for making a linear regression: That the data (gene expressions) are independent.&amp;lt;br&amp;gt;Make your code in a general way - there can for example be more data files. Make it easy to add them.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# In the files &#039;&#039;geneA.txt&#039;&#039;, &#039;&#039;geneB.txt&#039;&#039;, all the way down to &#039;&#039;geneE.txt&#039;&#039; you have normalized mRNA expression data taken at the time of discovery of colon cancer for a number of patients and their survival. This is basically 2 columns in each file; The mRNA expression (x) and the number of months (y) the patient survived. For each gene you have to make a [https://en.wikipedia.org/wiki/Simple_linear_regression simple linear regression] analysis and find 3 numbers; the &#039;&#039;&#039;&amp;amp;alpha;&#039;&#039;&#039; (the intercept - where the line cuts the Y-axis) and &#039;&#039;&#039;&amp;amp;beta;&#039;&#039;&#039; (the slope) coefficient that describes the line running through the data points best, and the correlation coefficient (&#039;&#039;&#039;r&#039;&#039;&#039;) which describes the fitness of the line. You must identify the gene that best indicates how long the patient survives. For every gene you start calculating these values.&amp;lt;br&amp;gt;[[File:calc.png|220px]] &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp;&#039;&#039;n&#039;&#039; = number of observations.&amp;lt;br&amp;gt;From the values you can compute the required parameters.&amp;lt;br&amp;gt;[[File:alfabeta.png|130px]] &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp;[[File:correlation.png|200px]]&amp;lt;br&amp;gt;Remember to say which gene best describes survival - and why. A survival prediction can be made by calculating &amp;amp;beta; * x + &amp;amp;alpha;, given x which is the mRNA expression.&amp;lt;br&amp;gt;Note: The genes will in reality interact with each other in ways that totally destroys our basic assumption for making a linear regression: That the data (gene expressions) are independent.&amp;lt;br&amp;gt;Make your code in a general way - there can for example be more data files. Make it easy to add them&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;.&amp;lt;br&amp;gt;The gene with the best correlation coefficient is geneD, with a CC of 83.75%&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Repeat the previous exercise again with a new type of data file &amp;#039;&amp;#039;gene_combined.txt&amp;#039;&amp;#039;, which is more typical in real life. All genes are in one tab separated file. There are 3 columns; gene name, normalized mRNA expression and survival in months. There is no particular order in which the data appears and data lines for several genes might be mixed within each other.&amp;lt;br&amp;gt;Again, make general code. There can be more or fewer genes, and you do not need to know there names beforehand.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Repeat the previous exercise again with a new type of data file &amp;#039;&amp;#039;gene_combined.txt&amp;#039;&amp;#039;, which is more typical in real life. All genes are in one tab separated file. There are 3 columns; gene name, normalized mRNA expression and survival in months. There is no particular order in which the data appears and data lines for several genes might be mixed within each other.&amp;lt;br&amp;gt;Again, make general code. There can be more or fewer genes, and you do not need to know there names beforehand.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=223&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=223&amp;oldid=prev"/>
		<updated>2025-09-06T09:55:44Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 11:55, 6 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l21&quot;&gt;Line 21:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 21:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Create a dictionary where the keys are codons and the value are the one-letter-code for the amino acids. The dictionary will function as a look-up table. You can find a [[codon list]] here. You are meant to make the dict &amp;quot;by hand&amp;quot; as there is a structure lesson in that. Add a bit of smart code that tests if you made the dict right.&amp;lt;br&amp;gt; Extra: If you feel like it you can in addition make a program that constructs the dict from a file, which you are responsible for making.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Create a dictionary where the keys are codons and the value are the one-letter-code for the amino acids. The dictionary will function as a look-up table. You can find a [[codon list]] here. You are meant to make the dict &amp;quot;by hand&amp;quot; as there is a structure lesson in that. Add a bit of smart code that tests if you made the dict right.&amp;lt;br&amp;gt; Extra: If you feel like it you can in addition make a program that constructs the dict from a file, which you are responsible for making.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Use the dictionary from the previous exercise and your previous functions &amp;#039;&amp;#039;&amp;#039;fastaread()&amp;#039;&amp;#039;&amp;#039; and &amp;#039;&amp;#039;&amp;#039;fastawrite()&amp;#039;&amp;#039;&amp;#039; in a program, that translates all the nucleotide fasta entries in &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; to amino acid sequence. Save the results in a file &amp;#039;&amp;#039;aa7.fsa&amp;#039;&amp;#039; in fasta format. Since the sequences are now consisting of amino acids add &amp;#039;Amino Acid Sequence&amp;#039; to each header. The STOP codon is NOT a part of the amino acid sequence. Think about what STOP means.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Use the dictionary from the previous exercise and your previous functions &amp;#039;&amp;#039;&amp;#039;fastaread()&amp;#039;&amp;#039;&amp;#039; and &amp;#039;&amp;#039;&amp;#039;fastawrite()&amp;#039;&amp;#039;&amp;#039; in a program, that translates all the nucleotide fasta entries in &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; to amino acid sequence. Save the results in a file &amp;#039;&amp;#039;aa7.fsa&amp;#039;&amp;#039; in fasta format. Since the sequences are now consisting of amino acids add &amp;#039;Amino Acid Sequence&amp;#039; to each header. The STOP codon is NOT a part of the amino acid sequence. Think about what STOP means.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# In the file &#039;&#039;ex5.acc&#039;&#039; are a lot of accession numbers, where some are duplicates. Earlier we just removed the duplicates, but now we should count them. Make a program that reads the file once, and writes a file &#039;&#039;order5.acc&#039;&#039; with the unique accession numbers and the number of occurrences in the file. A line should look like this: &quot;AC24677 2&quot;, if this accession occurs twice in &#039;&#039;ex5.acc&#039;&#039;. The accession numbers must be written in order, which means the accession with most duplicates is on top (the beginning) and the least on bottom. If two accessions have the same amount of duplicates, they need to be ordered according to the accession name, i.e. AC543322 is before BG001110.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# In the file &#039;&#039;ex5.acc&#039;&#039; are a lot of accession numbers, where some are duplicates. Earlier we just removed the duplicates, but now we should count them. Make a program that reads the file once, and writes a file &#039;&#039;order5.acc&#039;&#039; with the unique accession numbers and the number of occurrences in the file. A line should look like this: &quot;AC24677 2&quot;, if this accession occurs twice in &#039;&#039;ex5.acc&#039;&#039;. The accession numbers must be written in order, which means the accession with most duplicates is on top (the beginning) and the least on bottom. If two accessions have the same amount of duplicates, they need to be ordered according to the accession name, i.e. AC543322 is before BG001110&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;.&amp;lt;br&amp;gt;Note: This is quite a tricky exercise. If you are absolutely stuck, then at least order the accessions by the number of duplicates and hand in&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &amp;lt;font color=&amp;quot;#AA00FF&amp;quot;&amp;gt;In the tab-separated files &amp;#039;&amp;#039;slinger.txt&amp;#039;&amp;#039; and &amp;#039;&amp;#039;hoist.txt&amp;#039;&amp;#039; are two columns with an accession number and a numeric result; a probability between 0 and 1. The numbers are from running 2 different programs (slinger and hoist, if you are in doubt). You must combine these probabilities - basically taking the average of the two numbers - for each accession number and write the result in a file &amp;#039;&amp;#039;combined.txt&amp;#039;&amp;#039;. The file should look like the sources, i.e. tab-separated with accession in column 1 and number in column 2. Unfortunately, the two programs have not been run from the same set of accession numbers, so some of the results are only available in one of the input files. In such case you ignore/discard the data for that accession. Only save results in the output file when the accession is in both of the input files.&amp;lt;/font&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &amp;lt;font color=&amp;quot;#AA00FF&amp;quot;&amp;gt;In the tab-separated files &amp;#039;&amp;#039;slinger.txt&amp;#039;&amp;#039; and &amp;#039;&amp;#039;hoist.txt&amp;#039;&amp;#039; are two columns with an accession number and a numeric result; a probability between 0 and 1. The numbers are from running 2 different programs (slinger and hoist, if you are in doubt). You must combine these probabilities - basically taking the average of the two numbers - for each accession number and write the result in a file &amp;#039;&amp;#039;combined.txt&amp;#039;&amp;#039;. The file should look like the sources, i.e. tab-separated with accession in column 1 and number in column 2. Unfortunately, the two programs have not been run from the same set of accession numbers, so some of the results are only available in one of the input files. In such case you ignore/discard the data for that accession. Only save results in the output file when the accession is in both of the input files.&amp;lt;/font&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Using above method gives you too little data. You try this time to combine your two input sets differently. If an accession is in both input files you use the average, if it is in only one, you just use the number straight in the output file. This is effectively making a union of the input instead of an intersection.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Using above method gives you too little data. You try this time to combine your two input sets differently. If an accession is in both input files you use the average, if it is in only one, you just use the number straight in the output file. This is effectively making a union of the input instead of an intersection.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=205&amp;oldid=prev</id>
		<title>WikiSysop at 09:01, 5 September 2025</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=205&amp;oldid=prev"/>
		<updated>2025-09-05T09:01:42Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 11:01, 5 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l24&quot;&gt;Line 24:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 24:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &amp;lt;font color=&amp;quot;#AA00FF&amp;quot;&amp;gt;In the tab-separated files &amp;#039;&amp;#039;slinger.txt&amp;#039;&amp;#039; and &amp;#039;&amp;#039;hoist.txt&amp;#039;&amp;#039; are two columns with an accession number and a numeric result; a probability between 0 and 1. The numbers are from running 2 different programs (slinger and hoist, if you are in doubt). You must combine these probabilities - basically taking the average of the two numbers - for each accession number and write the result in a file &amp;#039;&amp;#039;combined.txt&amp;#039;&amp;#039;. The file should look like the sources, i.e. tab-separated with accession in column 1 and number in column 2. Unfortunately, the two programs have not been run from the same set of accession numbers, so some of the results are only available in one of the input files. In such case you ignore/discard the data for that accession. Only save results in the output file when the accession is in both of the input files.&amp;lt;/font&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &amp;lt;font color=&amp;quot;#AA00FF&amp;quot;&amp;gt;In the tab-separated files &amp;#039;&amp;#039;slinger.txt&amp;#039;&amp;#039; and &amp;#039;&amp;#039;hoist.txt&amp;#039;&amp;#039; are two columns with an accession number and a numeric result; a probability between 0 and 1. The numbers are from running 2 different programs (slinger and hoist, if you are in doubt). You must combine these probabilities - basically taking the average of the two numbers - for each accession number and write the result in a file &amp;#039;&amp;#039;combined.txt&amp;#039;&amp;#039;. The file should look like the sources, i.e. tab-separated with accession in column 1 and number in column 2. Unfortunately, the two programs have not been run from the same set of accession numbers, so some of the results are only available in one of the input files. In such case you ignore/discard the data for that accession. Only save results in the output file when the accession is in both of the input files.&amp;lt;/font&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Using above method gives you too little data. You try this time to combine your two input sets differently. If an accession is in both input files you use the average, if it is in only one, you just use the number straight in the output file. This is effectively making a union of the input instead of an intersection.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Using above method gives you too little data. You try this time to combine your two input sets differently. If an accession is in both input files you use the average, if it is in only one, you just use the number straight in the output file. This is effectively making a union of the input instead of an intersection.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;The &lt;/del&gt;&#039;&#039;geneA&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;-E&lt;/del&gt;.txt&#039;&#039; &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;files &lt;/del&gt;all have the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;same structure on &lt;/del&gt;each &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;line&lt;/del&gt;; &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;first &lt;/del&gt;number &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;is &lt;/del&gt;a &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;float between 0 &lt;/del&gt;and &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;1, second number is an integer &lt;/del&gt;(&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;representing months of survival after discovery of &lt;/del&gt;the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;cancer&lt;/del&gt;)&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;. For all files &lt;/del&gt;(the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;combined data set&lt;/del&gt;) &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;find &lt;/del&gt;the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;average of &lt;/del&gt;the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;float&lt;/del&gt;, &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;given &lt;/del&gt;the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;integer and display in ascending order &lt;/del&gt;of the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;integer&lt;/del&gt;. You &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;need to add all &lt;/del&gt;the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;floats for a given integer together and divide by &lt;/del&gt;the number of &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;floats for &lt;/del&gt;the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;integer, then &lt;/del&gt;you &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;have &lt;/del&gt;the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;average for the integer&lt;/del&gt;. &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;To succeed at this&lt;/del&gt;, &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;you must use two dicts where the integer &lt;/del&gt;is the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;key in both&lt;/del&gt;. The &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;corresponding values are &lt;/del&gt;the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;sum of the floats &lt;/del&gt;(for &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;that key) and &lt;/del&gt;the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;number &lt;/del&gt;of &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;times the key has been encountered &lt;/del&gt;in the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;files&lt;/del&gt;.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;In the files &lt;/ins&gt;&#039;&#039;geneA.txt&#039;&#039;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;, &#039;&#039;geneB.txt&#039;&#039;, &lt;/ins&gt;all &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;the way down to &#039;&#039;geneE.txt&#039;&#039; you &lt;/ins&gt;have &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;normalized mRNA expression data taken at &lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;time of discovery of colon cancer for a number of patients and their survival. This is basically 2 columns in &lt;/ins&gt;each &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;file&lt;/ins&gt;; &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;The mRNA expression (x) and the &lt;/ins&gt;number &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;of months (y) the patient survived. For each gene you have to make &lt;/ins&gt;a &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[https://en.wikipedia.org/wiki/Simple_linear_regression simple linear regression] analysis &lt;/ins&gt;and &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;find 3 numbers; the &#039;&#039;&#039;&amp;amp;alpha;&#039;&#039;&#039; &lt;/ins&gt;(the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;intercept - where the line cuts the Y-axis&lt;/ins&gt;) &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;and &#039;&#039;&#039;&amp;amp;beta;&#039;&#039;&#039; &lt;/ins&gt;(the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;slope&lt;/ins&gt;) &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;coefficient that describes &lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;line running through &lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;data points best&lt;/ins&gt;, &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;and the correlation coefficient (&#039;&#039;&#039;r&#039;&#039;&#039;) which describes &lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;fitness &lt;/ins&gt;of the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;line&lt;/ins&gt;. You &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;must identify &lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;gene that best indicates how long &lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;patient survives. For every gene you start calculating these values.&amp;lt;br&amp;gt;[[File:calc.png|220px]] &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp;&#039;&#039;n&#039;&#039; = &lt;/ins&gt;number of &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;observations.&amp;lt;br&amp;gt;From &lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;values &lt;/ins&gt;you &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;can compute &lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;required parameters.&amp;lt;br&amp;gt;[[File:alfabeta.png|130px]] &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp; &amp;amp;nbsp;[[File:correlation.png|200px]]&amp;lt;br&amp;gt;Remember to say which gene best describes survival - and why&lt;/ins&gt;. &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;A survival prediction can be made by calculating &amp;amp;beta; * x + &amp;amp;alpha;&lt;/ins&gt;, &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;given x which &lt;/ins&gt;is the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;mRNA expression&lt;/ins&gt;.&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;Note: &lt;/ins&gt;The &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;genes will in reality interact with each other in ways that totally destroys our basic assumption for making a linear regression: That &lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;data &lt;/ins&gt;(&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;gene expressions) are independent.&amp;lt;br&amp;gt;Make your code in a general way - there can &lt;/ins&gt;for &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;example be more data files. Make it easy to add them.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;# Repeat &lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;previous exercise again with a new type &lt;/ins&gt;of &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;data file &#039;&#039;gene_combined.txt&#039;&#039;, which is more typical in real life. All genes are in one tab separated file. There are 3 columns; gene name, normalized mRNA expression and survival in months. There is no particular order &lt;/ins&gt;in &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;which &lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;data appears and data lines for several genes might be mixed within each other.&amp;lt;br&amp;gt;Again, make general code. There can be more or fewer genes, and you do not need to know there names beforehand&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* The &#039;&#039;geneA-E.txt&#039;&#039; files all have the same structure on each line; first number is a float between 0 and 1, second number is an integer (representing months of survival after discovery of the cancer). For all files (the combined data set) find the average of the float, given the integer and display in ascending order of the integer. You need to add all the floats for a given integer together and divide by the number of floats for the integer, then you have the average for the integer. To succeed at this, you must use two dicts where the integer is the key in both. The corresponding values are the sum of the floats (for that key) and the number of times the key has been encountered in the files. Hint: Unfortunately, this does not make much biological sense, but is more in the nature of a programming exercise.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* This exercise requires exercise 6 from [[Simple pattern matching]]. Modify the code a bit so you only compute what you have to. In the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files count who many times the different codons in the coding sequence occurs. Display.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* This exercise requires exercise 6 from [[Simple pattern matching]]. Modify the code a bit so you only compute what you have to. In the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files count who many times the different codons in the coding sequence occurs. Display.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* This exercise builds on exercise 2 for this lesson. You must read the &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; file and translate the DNA sequences to protein sequence. Report the frequencies of the various amino acids for the entire file - all sequences (not individual sequences). That is - count how many there is of each amino acid (a total) in the translated sequences, compute the frequency of each (Number_of_this_amino_acid/Total_number_of_amino_acids) and print the results as &amp;quot;S   0.0123&amp;quot;, i.e. 4 digits after the dot. You do not need to save the fasta file.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* This exercise builds on exercise 2 for this lesson. You must read the &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; file and translate the DNA sequences to protein sequence. Report the frequencies of the various amino acids for the entire file - all sequences (not individual sequences). That is - count how many there is of each amino acid (a total) in the translated sequences, compute the frequency of each (Number_of_this_amino_acid/Total_number_of_amino_acids) and print the results as &amp;quot;S   0.0123&amp;quot;, i.e. 4 digits after the dot. You do not need to save the fasta file.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=190&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises for extra practice */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=190&amp;oldid=prev"/>
		<updated>2025-09-03T11:57:13Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises for extra practice&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 13:57, 3 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l27&quot;&gt;Line 27:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 27:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* Given a tab-separated file with 3 columns; StudentID, CourseNumber, Grade. Can you find a way to load the grades for a student in a retrievable manner into (some of) the python data structures learned so far? Retrievable means here that you can find the grades for a student if you know the studentID.&amp;lt;br&amp;gt;Explain your approach. Hint: It is not necessarily efficient.&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* This exercise requires &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;exercise 6 from &lt;/ins&gt;[[Simple &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;pattern matching&lt;/ins&gt;]]&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;. Modify the code a bit so you only compute what you have to&lt;/ins&gt;. In the &#039;&#039;data1-4.gb&#039;&#039; files count who many times the different codons in the coding sequence occurs. Display.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* This exercise requires &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;that you did the last two practice exercises in &lt;/del&gt;[[Simple &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Pattern Matching&lt;/del&gt;]]. In the &#039;&#039;data1-4.gb&#039;&#039; files count who many times the different codons in the coding sequence occurs. Display.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* This exercise builds on exercise 2 &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;for this lesson&lt;/ins&gt;. You must read the &#039;&#039;dna7.fsa&#039;&#039; file and translate the DNA sequences to protein sequence. Report the frequencies of the various amino acids for the entire file - all sequences (not individual sequences). That is - count how many there is of each amino acid (a total) in the translated sequences, compute the frequency of each (Number_of_this_amino_acid/Total_number_of_amino_acids) and print the results as &quot;S   0.0123&quot;, i.e. 4 digits after the dot&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;. You do not need to save the fasta file&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* This exercise builds on &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;mandatory &lt;/del&gt;exercise 2. You must read the &#039;&#039;dna7.fsa&#039;&#039; file and translate the DNA sequences to protein sequence. Report the frequencies of the various amino acids for the entire file - all sequences (not individual sequences). That is - count how many there is of each amino acid (a total) in the translated sequences, compute the frequency of each (Number_of_this_amino_acid/Total_number_of_amino_acids) and print the results as &quot;S   0.0123&quot;, i.e. 4 digits after the dot.&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=189&amp;oldid=prev</id>
		<title>WikiSysop at 11:51, 3 September 2025</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=189&amp;oldid=prev"/>
		<updated>2025-09-03T11:51:39Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 13:51, 3 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l24&quot;&gt;Line 24:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 24:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &amp;lt;font color=&amp;quot;#AA00FF&amp;quot;&amp;gt;In the tab-separated files &amp;#039;&amp;#039;slinger.txt&amp;#039;&amp;#039; and &amp;#039;&amp;#039;hoist.txt&amp;#039;&amp;#039; are two columns with an accession number and a numeric result; a probability between 0 and 1. The numbers are from running 2 different programs (slinger and hoist, if you are in doubt). You must combine these probabilities - basically taking the average of the two numbers - for each accession number and write the result in a file &amp;#039;&amp;#039;combined.txt&amp;#039;&amp;#039;. The file should look like the sources, i.e. tab-separated with accession in column 1 and number in column 2. Unfortunately, the two programs have not been run from the same set of accession numbers, so some of the results are only available in one of the input files. In such case you ignore/discard the data for that accession. Only save results in the output file when the accession is in both of the input files.&amp;lt;/font&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &amp;lt;font color=&amp;quot;#AA00FF&amp;quot;&amp;gt;In the tab-separated files &amp;#039;&amp;#039;slinger.txt&amp;#039;&amp;#039; and &amp;#039;&amp;#039;hoist.txt&amp;#039;&amp;#039; are two columns with an accession number and a numeric result; a probability between 0 and 1. The numbers are from running 2 different programs (slinger and hoist, if you are in doubt). You must combine these probabilities - basically taking the average of the two numbers - for each accession number and write the result in a file &amp;#039;&amp;#039;combined.txt&amp;#039;&amp;#039;. The file should look like the sources, i.e. tab-separated with accession in column 1 and number in column 2. Unfortunately, the two programs have not been run from the same set of accession numbers, so some of the results are only available in one of the input files. In such case you ignore/discard the data for that accession. Only save results in the output file when the accession is in both of the input files.&amp;lt;/font&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Using above method gives you too little data. You try this time to combine your two input sets differently. If an accession is in both input files you use the average, if it is in only one, you just use the number straight in the output file. This is effectively making a union of the input instead of an intersection.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Using above method gives you too little data. You try this time to combine your two input sets differently. If an accession is in both input files you use the average, if it is in only one, you just use the number straight in the output file. This is effectively making a union of the input instead of an intersection.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;# The &#039;&#039;geneA-E.txt&#039;&#039; files all have the same structure on each line; first number is a float between 0 and 1, second number is an integer (representing months of survival after discovery of the cancer). For all files (the combined data set) find the average of the float, given the integer and display in ascending order of the integer. You need to add all the floats for a given integer together and divide by the number of floats for the integer, then you have the average for the integer. To succeed at this, you must use two dicts where the integer is the key in both. The corresponding values are the sum of the floats (for that key) and the number of times the key has been encountered in the files.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Given a tab-separated file with 3 columns; StudentID, CourseNumber, Grade. Can you find a way to load the grades for a student in a retrievable manner into (some of) the python data structures learned so far? Retrievable means here that you can find the grades for a student if you know the studentID.&amp;lt;br&amp;gt;Explain your approach. Hint: It is not necessarily efficient.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Given a tab-separated file with 3 columns; StudentID, CourseNumber, Grade. Can you find a way to load the grades for a student in a retrievable manner into (some of) the python data structures learned so far? Retrievable means here that you can find the grades for a student if you know the studentID.&amp;lt;br&amp;gt;Explain your approach. Hint: It is not necessarily efficient.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* The &#039;&#039;geneA-E.txt&#039;&#039; files all have the same structure on each line; first number is a float between 0 and 1, second number is an integer. For all files (the combined data set) find the average of the float, given the integer and display in ascending order of the integer. You need to add all the floats for a given integer together and divide by the number of floats for the integer, then you have the average for the integer. To succeed at this, you must use two dicts where the integer is the key in both. The corresponding values are the sum of the floats (for that key) and the number of times the key has been encountered in the files.&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* This exercise requires that you did the last two practice exercises in [[Simple Pattern Matching]]. In the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files count who many times the different codons in the coding sequence occurs. Display.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* This exercise requires that you did the last two practice exercises in [[Simple Pattern Matching]]. In the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files count who many times the different codons in the coding sequence occurs. Display.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* This exercise builds on mandatory exercise 2. You must read the &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; file and translate the DNA sequences to protein sequence. Report the frequencies of the various amino acids for the entire file - all sequences (not individual sequences). That is - count how many there is of each amino acid (a total) in the translated sequences, compute the frequency of each (Number_of_this_amino_acid/Total_number_of_amino_acids) and print the results as &amp;quot;S   0.0123&amp;quot;, i.e. 4 digits after the dot.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* This exercise builds on mandatory exercise 2. You must read the &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; file and translate the DNA sequences to protein sequence. Report the frequencies of the various amino acids for the entire file - all sequences (not individual sequences). That is - count how many there is of each amino acid (a total) in the translated sequences, compute the frequency of each (Number_of_this_amino_acid/Total_number_of_amino_acids) and print the results as &amp;quot;S   0.0123&amp;quot;, i.e. 4 digits after the dot.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=188&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=188&amp;oldid=prev"/>
		<updated>2025-09-03T11:46:56Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 13:46, 3 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l19&quot;&gt;Line 19:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 19:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Create a dictionary where the keys are codons and the value are the one-letter-code for the amino acids. The dictionary will function as a look-up table. You can find a [[codon list]] here. You are meant to make the dict &quot;by hand&quot; as there is a structure lesson in that. If you feel like it you can in addition make a program that constructs the dict from a file.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Create a dictionary where the keys are codons and the value are the one-letter-code for the amino acids. The dictionary will function as a look-up table. You can find a [[codon list]] here. You are meant to make the dict &quot;by hand&quot; as there is a structure lesson in that. &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Add a bit of smart code that tests if you made the dict right.&amp;lt;br&amp;gt; Extra: &lt;/ins&gt;If you feel like it you can in addition make a program that constructs the dict from a file&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;, which you are responsible for making&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Use the dictionary from the previous exercise in a program, that translates all the nucleotide fasta entries in &#039;&#039;dna7.fsa&#039;&#039; to amino acid sequence. Save the results in a file &#039;&#039;aa7.fsa&#039;&#039; in fasta format. Since the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;sequence is &lt;/del&gt;now consisting of amino acids add &#039;Amino Acid Sequence&#039; to each header. The STOP codon is NOT a part of the amino acid sequence. Think about what STOP means.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Use the dictionary from the previous exercise &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;and your previous functions &#039;&#039;&#039;fastaread()&#039;&#039;&#039; and &#039;&#039;&#039;fastawrite()&#039;&#039;&#039; &lt;/ins&gt;in a program, that translates all the nucleotide fasta entries in &#039;&#039;dna7.fsa&#039;&#039; to amino acid sequence. Save the results in a file &#039;&#039;aa7.fsa&#039;&#039; in fasta format. Since the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;sequences are &lt;/ins&gt;now consisting of amino acids add &#039;Amino Acid Sequence&#039; to each header. The STOP codon is NOT a part of the amino acid sequence. Think about what STOP means.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# In the file &#039;&#039;ex5.acc&#039;&#039; are a lot of accession numbers, where some are duplicates. Earlier we just removed the duplicates, now we should count them. Make a program that reads the file once, and writes a file &#039;&#039;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;noorder5&lt;/del&gt;.acc&#039;&#039; with the unique accession numbers and the number of occurrences in the file. A line should look like this: &quot;AC24677 2&quot;, if this accession occurs twice in &#039;&#039;ex5.acc&#039;&#039;.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# In the file &#039;&#039;ex5.acc&#039;&#039; are a lot of accession numbers, where some are duplicates. Earlier we just removed the duplicates, &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;but &lt;/ins&gt;now we should count them. Make a program that reads the file once, and writes a file &#039;&#039;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;order5&lt;/ins&gt;.acc&#039;&#039; with the unique accession numbers and the number of occurrences in the file. A line should look like this: &quot;AC24677 2&quot;, if this accession occurs twice in &#039;&#039;ex5.acc&#039;&#039;. &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;The accession numbers must be written in order, which means the accession with most duplicates is on top (&lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;beginning) and &lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;least on bottom. If two &lt;/ins&gt;accessions &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;have the same amount &lt;/ins&gt;of &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;duplicates, they need to be ordered according to &lt;/ins&gt;the &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;accession name, i.e&lt;/ins&gt;. &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;AC543322 is before BG001110&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;# Improve &lt;/del&gt;the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;previous exercise by saving &lt;/del&gt;the accessions &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;in order &lt;/del&gt;of &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;occurrences with &lt;/del&gt;the &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;top counts first in the file &#039;&#039;order5&lt;/del&gt;.&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;acc&#039;&#039;&lt;/del&gt;.&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &amp;lt;font color=&amp;quot;#AA00FF&amp;quot;&amp;gt;In the tab-separated files &amp;#039;&amp;#039;slinger.txt&amp;#039;&amp;#039; and &amp;#039;&amp;#039;hoist.txt&amp;#039;&amp;#039; are two columns with an accession number and a numeric result; a probability between 0 and 1. The numbers are from running 2 different programs (slinger and hoist, if you are in doubt). You must combine these probabilities - basically taking the average of the two numbers - for each accession number and write the result in a file &amp;#039;&amp;#039;combined.txt&amp;#039;&amp;#039;. The file should look like the sources, i.e. tab-separated with accession in column 1 and number in column 2. Unfortunately, the two programs have not been run from the same set of accession numbers, so some of the results are only available in one of the input files. In such case you ignore/discard the data for that accession. Only save results in the output file when the accession is in both of the input files.&amp;lt;/font&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &amp;lt;font color=&amp;quot;#AA00FF&amp;quot;&amp;gt;In the tab-separated files &amp;#039;&amp;#039;slinger.txt&amp;#039;&amp;#039; and &amp;#039;&amp;#039;hoist.txt&amp;#039;&amp;#039; are two columns with an accession number and a numeric result; a probability between 0 and 1. The numbers are from running 2 different programs (slinger and hoist, if you are in doubt). You must combine these probabilities - basically taking the average of the two numbers - for each accession number and write the result in a file &amp;#039;&amp;#039;combined.txt&amp;#039;&amp;#039;. The file should look like the sources, i.e. tab-separated with accession in column 1 and number in column 2. Unfortunately, the two programs have not been run from the same set of accession numbers, so some of the results are only available in one of the input files. In such case you ignore/discard the data for that accession. Only save results in the output file when the accession is in both of the input files.&amp;lt;/font&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Using above method gives you too little data. You try this time to combine your two input sets differently. If an accession is in both input files you use the average, if it is in only one, you just use the number straight in the output file. This is effectively making a union of the input instead of an intersection.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Using above method gives you too little data. You try this time to combine your two input sets differently. If an accession is in both input files you use the average, if it is in only one, you just use the number straight in the output file. This is effectively making a union of the input instead of an intersection.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=185&amp;oldid=prev</id>
		<title>WikiSysop: Created page with &quot;__NOTOC__ {| width=500  style=&quot;font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;&quot; |Previous: Set techniques |Next: Regular expressions |} == Required course material for the lesson == Powerpoint: [https://teaching.healthtech.dtu.dk/material/22116/22116_11-Dicts.ppt Dictionaries]&lt;br&gt; &lt;!-- Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=61d288cc-5f62-4027-a83e-af27012c27d2 Dictionaries]&lt;br&gt; Video: [https://panopto.dtu.dk/Panopto/Pages/...&quot;</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Dict_techniques&amp;diff=185&amp;oldid=prev"/>
		<updated>2025-09-03T11:28:51Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;__NOTOC__ {| width=500  style=&amp;quot;font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;&amp;quot; |Previous: &lt;a href=&quot;/22116/index.php/Set_techniques&quot; title=&quot;Set techniques&quot;&gt;Set techniques&lt;/a&gt; |Next: &lt;a href=&quot;/22116/index.php/Regular_expressions&quot; title=&quot;Regular expressions&quot;&gt;Regular expressions&lt;/a&gt; |} == Required course material for the lesson == Powerpoint: [https://teaching.healthtech.dtu.dk/material/22116/22116_11-Dicts.ppt Dictionaries]&amp;lt;br&amp;gt; &amp;lt;!-- Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=61d288cc-5f62-4027-a83e-af27012c27d2 Dictionaries]&amp;lt;br&amp;gt; Video: [https://panopto.dtu.dk/Panopto/Pages/...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;__NOTOC__&lt;br /&gt;
{| width=500  style=&amp;quot;font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;&amp;quot;&lt;br /&gt;
|Previous: [[Set techniques]]&lt;br /&gt;
|Next: [[Regular expressions]]&lt;br /&gt;
|}&lt;br /&gt;
== Required course material for the lesson ==&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22116/22116_11-Dicts.ppt Dictionaries]&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;!-- Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=61d288cc-5f62-4027-a83e-af27012c27d2 Dictionaries]&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=7b897492-bedb-4042-a4fc-af27012c0a0b Tips and Tricks]&amp;lt;br&amp;gt; --&amp;gt;&lt;br /&gt;
Resource: [https://teaching.healthtech.dtu.dk/material/22116/clean_code.html Clean Code] Every time you read it, you will take something from it.&amp;lt;br&amp;gt;&lt;br /&gt;
Resource: [[Example code - Dicts]]&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=276ecfa4-012e-47b4-b521-af27012b06d9 Live Coding]&lt;br /&gt;
&lt;br /&gt;
== Subjects covered ==&lt;br /&gt;
* Dictionaries - dicts - which are unordered tables of data.&lt;br /&gt;
* Dict methods and functions&lt;br /&gt;
* Dict tips and tricks&lt;br /&gt;
* Dict algorithms&lt;br /&gt;
&lt;br /&gt;
== Exercises to be handed in ==&lt;br /&gt;
# Create a dictionary where the keys are codons and the value are the one-letter-code for the amino acids. The dictionary will function as a look-up table. You can find a [[codon list]] here. You are meant to make the dict &amp;quot;by hand&amp;quot; as there is a structure lesson in that. If you feel like it you can in addition make a program that constructs the dict from a file.&lt;br /&gt;
# Use the dictionary from the previous exercise in a program, that translates all the nucleotide fasta entries in &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; to amino acid sequence. Save the results in a file &amp;#039;&amp;#039;aa7.fsa&amp;#039;&amp;#039; in fasta format. Since the sequence is now consisting of amino acids add &amp;#039;Amino Acid Sequence&amp;#039; to each header. The STOP codon is NOT a part of the amino acid sequence. Think about what STOP means.&lt;br /&gt;
# In the file &amp;#039;&amp;#039;ex5.acc&amp;#039;&amp;#039; are a lot of accession numbers, where some are duplicates. Earlier we just removed the duplicates, now we should count them. Make a program that reads the file once, and writes a file &amp;#039;&amp;#039;noorder5.acc&amp;#039;&amp;#039; with the unique accession numbers and the number of occurrences in the file. A line should look like this: &amp;quot;AC24677 2&amp;quot;, if this accession occurs twice in &amp;#039;&amp;#039;ex5.acc&amp;#039;&amp;#039;.&lt;br /&gt;
# Improve the previous exercise by saving the accessions in order of occurrences with the top counts first in the file &amp;#039;&amp;#039;order5.acc&amp;#039;&amp;#039;.&lt;br /&gt;
# &amp;lt;font color=&amp;quot;#AA00FF&amp;quot;&amp;gt;In the tab-separated files &amp;#039;&amp;#039;slinger.txt&amp;#039;&amp;#039; and &amp;#039;&amp;#039;hoist.txt&amp;#039;&amp;#039; are two columns with an accession number and a numeric result; a probability between 0 and 1. The numbers are from running 2 different programs (slinger and hoist, if you are in doubt). You must combine these probabilities - basically taking the average of the two numbers - for each accession number and write the result in a file &amp;#039;&amp;#039;combined.txt&amp;#039;&amp;#039;. The file should look like the sources, i.e. tab-separated with accession in column 1 and number in column 2. Unfortunately, the two programs have not been run from the same set of accession numbers, so some of the results are only available in one of the input files. In such case you ignore/discard the data for that accession. Only save results in the output file when the accession is in both of the input files.&amp;lt;/font&amp;gt;&lt;br /&gt;
# Using above method gives you too little data. You try this time to combine your two input sets differently. If an accession is in both input files you use the average, if it is in only one, you just use the number straight in the output file. This is effectively making a union of the input instead of an intersection.&lt;br /&gt;
&lt;br /&gt;
== Exercises for extra practice ==&lt;br /&gt;
* Given a tab-separated file with 3 columns; StudentID, CourseNumber, Grade. Can you find a way to load the grades for a student in a retrievable manner into (some of) the python data structures learned so far? Retrievable means here that you can find the grades for a student if you know the studentID.&amp;lt;br&amp;gt;Explain your approach. Hint: It is not necessarily efficient.&lt;br /&gt;
* The &amp;#039;&amp;#039;geneA-E.txt&amp;#039;&amp;#039; files all have the same structure on each line; first number is a float between 0 and 1, second number is an integer. For all files (the combined data set) find the average of the float, given the integer and display in ascending order of the integer. You need to add all the floats for a given integer together and divide by the number of floats for the integer, then you have the average for the integer. To succeed at this, you must use two dicts where the integer is the key in both. The corresponding values are the sum of the floats (for that key) and the number of times the key has been encountered in the files.&lt;br /&gt;
* This exercise requires that you did the last two practice exercises in [[Simple Pattern Matching]]. In the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files count who many times the different codons in the coding sequence occurs. Display.&lt;br /&gt;
* This exercise builds on mandatory exercise 2. You must read the &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; file and translate the DNA sequences to protein sequence. Report the frequencies of the various amino acids for the entire file - all sequences (not individual sequences). That is - count how many there is of each amino acid (a total) in the translated sequences, compute the frequency of each (Number_of_this_amino_acid/Total_number_of_amino_acids) and print the results as &amp;quot;S   0.0123&amp;quot;, i.e. 4 digits after the dot.&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
</feed>