<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?action=history&amp;feed=atom&amp;title=Set_techniques</id>
	<title>Set techniques - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?action=history&amp;feed=atom&amp;title=Set_techniques"/>
	<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;action=history"/>
	<updated>2026-05-14T04:21:20Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=266&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=266&amp;oldid=prev"/>
		<updated>2025-10-03T13:57:50Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 15:57, 3 October 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l15&quot;&gt;Line 15:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 15:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Many modern bioinformatic algorithms utilizes &amp;#039;&amp;#039;&amp;#039;k-mers&amp;#039;&amp;#039;&amp;#039;. A k-mer is a piece of sequence k units long. The k in the k-mer usually changes size, i.e. length of the sequence depending on the algorithm. It can be both protein and DNA sequence. As an example I have this DNA sequence: GACTAC. It contains the following 4-mers: GACT, ACTA, CTAC. The first 4 exercises works with k-mers and sets and leads up to an algorithm for find the sequence in a group that looks most like a target sequence. Would they be homologs? Good bet.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Many modern bioinformatic algorithms utilizes &amp;#039;&amp;#039;&amp;#039;k-mers&amp;#039;&amp;#039;&amp;#039;. A k-mer is a piece of sequence k units long. The k in the k-mer usually changes size, i.e. length of the sequence depending on the algorithm. It can be both protein and DNA sequence. As an example I have this DNA sequence: GACTAC. It contains the following 4-mers: GACT, ACTA, CTAC. The first 4 exercises works with k-mers and sets and leads up to an algorithm for find the sequence in a group that looks most like a target sequence. Would they be homologs? Good bet.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# From a genbank file like &#039;&#039;data1-4.gb&#039;&#039; extract the DNA sequence (been there - done that). Now insert all 5-mers in the DNA sequence into a set. Compute the following numbers: how many did you insert in the set, how many are in the set, how many different 5-mers could there possibly be. Display. It seems like the two first numbers are the same, but this is not guaranteed to be true. Explain why using your python knowledge.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt; &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Calculate the overlap of 5-mers between any two of the &#039;&#039;data1-4.gb&#039;&#039; files. Just ask for two filenames and calculate for these.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# From a genbank file like &#039;&#039;data1-4.gb&#039;&#039; extract the DNA sequence (been there - done that). Now insert all 5-mers in the DNA sequence into a set. Compute the following numbers: how many did you insert in the set, how many are in the set, how many different 5-mers could there possibly be. Display. It seems like the two first numbers are the same, but this is not guaranteed to be true. Explain why using your python knowledge.&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that asks for a number (integer) then reads the &#039;&#039;dna7.fsa&#039;&#039; file (which contains insulin-like genes) and saves the entry selected by the number in &#039;&#039;selected.fsa&#039;&#039; and the rest of the entries into the file &#039;&#039;rest.fsa&#039;&#039; This should be a fairly easy task since you have your &#039;&#039;&#039;fastaread&#039;&#039;&#039; and &#039;&#039;&#039;fastawrite&#039;&#039;&#039; functions from lesson 8, exercise 5 and 6.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Calculate the overlap of 5-mers between any two of the &#039;&#039;data1-4.gb&#039;&#039; files. Just ask for two filenames and calculate for these.&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Now that you can select a fasta entry, then read the &#039;&#039;selected.fsa&#039;&#039; and create 5-mers from the sequence. Now read the entries from &#039;&#039;rest.fsa&#039;&#039; and for every entry create the 5-mers from the sequence. Report which sequence in &#039;&#039;rest.fsa&#039;&#039; had the greatest overlap (and how much overlap) with the selected sequence. This must be the sequence that looks most like the selected one.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that asks for a number (integer) then reads the &#039;&#039;dna7.fsa&#039;&#039; file (which contains insulin-like genes) and saves the entry selected by the number in &#039;&#039;selected.fsa&#039;&#039; and the rest of the entries into the file &#039;&#039;rest.fsa&#039;&#039; This should be a fairly easy task since you have your &#039;&#039;&#039;fastaread&#039;&#039;&#039; and &#039;&#039;&#039;fastawrite&#039;&#039;&#039; functions from lesson 8, exercise 5 and 6.&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Read the sequences in the file &#039;&#039;dna7.fsa&#039;&#039;. Find out which and how many of the 64 codons are &#039;&#039;&#039;not&#039;&#039;&#039; used somewhere in the sequences. Print the unused codons.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Now that you can select a fasta entry, then read the &#039;&#039;selected.fsa&#039;&#039; and create 5-mers from the sequence. Now read the entries from &#039;&#039;rest.fsa&#039;&#039; and for every entry create the 5-mers from the sequence. Report which sequence in &#039;&#039;rest.fsa&#039;&#039; had the greatest overlap (and how much overlap) with the selected sequence. This must be the sequence that looks most like the selected one.&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &amp;lt;font color=&quot;#AA00FF&quot;&amp;gt;You have made a program (let&#039;s call it the X-program), which as input takes a file of accession numbers, &#039;&#039;start10.dat&#039;&#039; and produces some output, which is in &#039;&#039;res10.dat&#039;&#039;. Now you count the lines in your input file and your output file and you discover that the line numbers do not match. Horror - your program does not produce output for some input. Now the assignment is to discover which accession numbers did not produce output. This can be done in various ways, but now you have to use a set. Print the results.&amp;lt;/font&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Read the sequences in the file &#039;&#039;dna7.fsa&#039;&#039;. Find out which and how many of the 64 codons are &#039;&#039;&#039;not&#039;&#039;&#039; used somewhere in the sequences. Print the unused codons.&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &amp;lt;font color=&quot;#AA00FF&quot;&amp;gt;In the file &#039;&#039;ex5.acc&#039;&#039; are a lot of accession numbers, where some are duplicates. We have earlier cleaned this file of duplicates. Let&#039;s do that again using a set. Make a program that reads the file once, finds the unique accession numbers and write them to the file &#039;&#039;uniq5.acc&#039;&#039;&amp;lt;/font&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &amp;lt;font color=&quot;#AA00FF&quot;&amp;gt;You have made a program (let&#039;s call it the X-program), which as input takes a file of accession numbers, &#039;&#039;start10.dat&#039;&#039; and produces some output, which is in &#039;&#039;res10.dat&#039;&#039;. Now you count the lines in your input file and your output file and you discover that the line numbers do not match. Horror - your program does not produce output for some input. Now the assignment is to discover which accession numbers did not produce output. This can be done in various ways, but now you have to use a set. Print the results.&amp;lt;/font&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;gt;&amp;lt;br&amp;gt;&amp;lt;br&lt;/ins&gt;&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# &amp;lt;font color=&quot;#AA00FF&quot;&amp;gt;In the file &#039;&#039;ex5.acc&#039;&#039; are a lot of accession numbers, where some are duplicates. We have earlier cleaned this file of duplicates. Let&#039;s do that again using a set. Make a program that reads the file once, finds the unique accession numbers and write them to the file &#039;&#039;uniq5.acc&#039;&#039;&amp;lt;/font&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;gt;&amp;lt;br&amp;gt;&amp;lt;br&lt;/ins&gt;&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# In the &amp;#039;&amp;#039;data1.gb&amp;#039;&amp;#039; file there are 6 references (to articles). Make a program that extracts all authors from the references, eliminates those that are duplicates and print the list of authors. You will notice that some authors seems to be the same person using different initials. You should only consider a person a duplicate if the name matches exactly. This should also work for the other Genbank entries: &amp;#039;&amp;#039;data2.gb&amp;#039;&amp;#039;, &amp;#039;&amp;#039;data3.gb&amp;#039;&amp;#039; &amp;amp; &amp;#039;&amp;#039;data4.gb&amp;#039;&amp;#039;.&amp;lt;br&amp;gt;Beware: there traps in this exercise, check your output properly.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# In the &amp;#039;&amp;#039;data1.gb&amp;#039;&amp;#039; file there are 6 references (to articles). Make a program that extracts all authors from the references, eliminates those that are duplicates and print the list of authors. You will notice that some authors seems to be the same person using different initials. You should only consider a person a duplicate if the name matches exactly. This should also work for the other Genbank entries: &amp;#039;&amp;#039;data2.gb&amp;#039;&amp;#039;, &amp;#039;&amp;#039;data3.gb&amp;#039;&amp;#039; &amp;amp; &amp;#039;&amp;#039;data4.gb&amp;#039;&amp;#039;.&amp;lt;br&amp;gt;Beware: there traps in this exercise, check your output properly.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=240&amp;oldid=prev</id>
		<title>WikiSysop: /* Required course material for the lesson */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=240&amp;oldid=prev"/>
		<updated>2025-09-22T10:03:39Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Required course material for the lesson&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 12:03, 22 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l5&quot;&gt;Line 5:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 5:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|}&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;|}&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Required course material for the lesson ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Required course material for the lesson ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Powerpoint: [https://teaching.healthtech.dtu.dk/material/22116/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;22116_12&lt;/del&gt;-Sets.ppt Sets]&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Powerpoint: [https://teaching.healthtech.dtu.dk/material/22116/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;22116_10&lt;/ins&gt;-Sets.ppt Sets]&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=e69a7b62-4f83-4b92-b254-af27012c464e Sets]&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=e69a7b62-4f83-4b92-b254-af27012c464e Sets]&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=198&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=198&amp;oldid=prev"/>
		<updated>2025-09-03T14:47:14Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 16:47, 3 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l15&quot;&gt;Line 15:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 15:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Many modern bioinformatic algorithms utilizes &amp;#039;&amp;#039;&amp;#039;k-mers&amp;#039;&amp;#039;&amp;#039;. A k-mer is a piece of sequence k units long. The k in the k-mer usually changes size, i.e. length of the sequence depending on the algorithm. It can be both protein and DNA sequence. As an example I have this DNA sequence: GACTAC. It contains the following 4-mers: GACT, ACTA, CTAC. The first 4 exercises works with k-mers and sets and leads up to an algorithm for find the sequence in a group that looks most like a target sequence. Would they be homologs? Good bet.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Many modern bioinformatic algorithms utilizes &amp;#039;&amp;#039;&amp;#039;k-mers&amp;#039;&amp;#039;&amp;#039;. A k-mer is a piece of sequence k units long. The k in the k-mer usually changes size, i.e. length of the sequence depending on the algorithm. It can be both protein and DNA sequence. As an example I have this DNA sequence: GACTAC. It contains the following 4-mers: GACT, ACTA, CTAC. The first 4 exercises works with k-mers and sets and leads up to an algorithm for find the sequence in a group that looks most like a target sequence. Would they be homologs? Good bet.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# From a genbank file like &#039;&#039;data1-4.gb&#039;&#039; extract the DNA sequence (been there - done that). Now insert all 5-mers in the DNA sequence into a set. Compute the following numbers: how many did you insert in the set, how many are in the set, how many different 5-mers could there possibly be. Display. It seems like the two first numbers are the same, but this is not guaranteed to be true. Explain why using your &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;biological background&lt;/del&gt;.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# From a genbank file like &#039;&#039;data1-4.gb&#039;&#039; extract the DNA sequence (been there - done that). Now insert all 5-mers in the DNA sequence into a set. Compute the following numbers: how many did you insert in the set, how many are in the set, how many different 5-mers could there possibly be. Display. It seems like the two first numbers are the same, but this is not guaranteed to be true. Explain why using your &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;python knowledge&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Calculate the overlap of 5-mers between any two of the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files. Just ask for two filenames and calculate for these.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Calculate the overlap of 5-mers between any two of the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files. Just ask for two filenames and calculate for these.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that asks for a number (integer) then reads the &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; file (which contains insulin-like genes) and saves the entry selected by the number in &amp;#039;&amp;#039;selected.fsa&amp;#039;&amp;#039; and the rest of the entries into the file &amp;#039;&amp;#039;rest.fsa&amp;#039;&amp;#039; This should be a fairly easy task since you have your &amp;#039;&amp;#039;&amp;#039;fastaread&amp;#039;&amp;#039;&amp;#039; and &amp;#039;&amp;#039;&amp;#039;fastawrite&amp;#039;&amp;#039;&amp;#039; functions from lesson 8, exercise 5 and 6.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that asks for a number (integer) then reads the &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; file (which contains insulin-like genes) and saves the entry selected by the number in &amp;#039;&amp;#039;selected.fsa&amp;#039;&amp;#039; and the rest of the entries into the file &amp;#039;&amp;#039;rest.fsa&amp;#039;&amp;#039; This should be a fairly easy task since you have your &amp;#039;&amp;#039;&amp;#039;fastaread&amp;#039;&amp;#039;&amp;#039; and &amp;#039;&amp;#039;&amp;#039;fastawrite&amp;#039;&amp;#039;&amp;#039; functions from lesson 8, exercise 5 and 6.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=184&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=184&amp;oldid=prev"/>
		<updated>2025-09-03T10:49:21Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 12:49, 3 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l14&quot;&gt;Line 14:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 14:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Many modern bioinformatic algorithms utilizes &#039;&#039;&#039;k-mers&#039;&#039;&#039;. A k-mer is a piece of sequence k units long. The k in the k-mer usually changes size, i.e. length of the sequence depending on the algorithm. It can be both protein and DNA sequence. As an example I have this DNA sequence: GACTAC. It contains the following 4-mers: GACT, ACTA, CTAC. The first 4 exercises works with k-mers and sets and leads up to an algorithm for find the sequence in a group that looks most like a target sequence. &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Homologs, anyone&lt;/del&gt;?&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Many modern bioinformatic algorithms utilizes &#039;&#039;&#039;k-mers&#039;&#039;&#039;. A k-mer is a piece of sequence k units long. The k in the k-mer usually changes size, i.e. length of the sequence depending on the algorithm. It can be both protein and DNA sequence. As an example I have this DNA sequence: GACTAC. It contains the following 4-mers: GACT, ACTA, CTAC. The first 4 exercises works with k-mers and sets and leads up to an algorithm for find the sequence in a group that looks most like a target sequence. &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Would they be homologs&lt;/ins&gt;? &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Good bet.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# From a genbank file like &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; extract the DNA sequence (been there - done that). Now insert all 5-mers in the DNA sequence into a set. Compute the following numbers: how many did you insert in the set, how many are in the set, how many different 5-mers could there possibly be. Display. It seems like the two first numbers are the same, but this is not guaranteed to be true. Explain why using your biological background.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# From a genbank file like &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; extract the DNA sequence (been there - done that). Now insert all 5-mers in the DNA sequence into a set. Compute the following numbers: how many did you insert in the set, how many are in the set, how many different 5-mers could there possibly be. Display. It seems like the two first numbers are the same, but this is not guaranteed to be true. Explain why using your biological background.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Calculate the overlap of 5-mers between any two of the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files. Just ask for two filenames and calculate for these.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Calculate the overlap of 5-mers between any two of the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files. Just ask for two filenames and calculate for these.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=180&amp;oldid=prev</id>
		<title>WikiSysop: /* Subjects covered */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=180&amp;oldid=prev"/>
		<updated>2025-09-02T14:20:16Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Subjects covered&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 16:20, 2 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l10&quot;&gt;Line 10:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 10:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Subjects covered ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Subjects covered ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Sets, which are unordered data collections with no duplicates.&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Sets, which are unordered data collections with no duplicates.&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Methods relevant to sets:&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Set methods&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;*&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;* &#039;&#039;&#039;clear&#039;&#039;&#039;, clears a set&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Set uses&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;** &#039;&#039;&#039;add&#039;&#039;&#039;, adds and element to the set,&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;** &#039;&#039;&#039;update&#039;&#039;&#039;, adds several elements to the set, performance,&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;** &#039;&#039;&#039;remove&#039;&#039;&#039;, removes an element from the set, error if not present,&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;** &#039;&#039;&#039;discard&#039;&#039;&#039;, removes an element from the set, no error,&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;** &#039;&#039;&#039;in&#039;&#039;&#039;, determines if an element exist,&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;** mathematical set operations, intersection (&amp;amp;), union (|)&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=171&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=171&amp;oldid=prev"/>
		<updated>2025-08-31T19:58:56Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 21:58, 31 August 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l23&quot;&gt;Line 23:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 23:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# From a genbank file like &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; extract the DNA sequence (been there - done that). Now insert all 5-mers in the DNA sequence into a set. Compute the following numbers: how many did you insert in the set, how many are in the set, how many different 5-mers could there possibly be. Display. It seems like the two first numbers are the same, but this is not guaranteed to be true. Explain why using your biological background.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# From a genbank file like &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; extract the DNA sequence (been there - done that). Now insert all 5-mers in the DNA sequence into a set. Compute the following numbers: how many did you insert in the set, how many are in the set, how many different 5-mers could there possibly be. Display. It seems like the two first numbers are the same, but this is not guaranteed to be true. Explain why using your biological background.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Calculate the overlap of 5-mers between any two of the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files. Just ask for two filenames and calculate for these.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Calculate the overlap of 5-mers between any two of the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files. Just ask for two filenames and calculate for these.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that asks for a number (integer) then reads the &#039;&#039;dna7.fsa&#039;&#039; file (which contains insulin-like genes) and saves the entry selected by the number in &#039;&#039;selected.fsa&#039;&#039; and the rest of the entries into the file &#039;&#039;rest.fsa&#039;&#039; This should be a fairly easy task since you have your fastaread and fastawrite functions from lesson 8, exercise 5 and 6.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that asks for a number (integer) then reads the &#039;&#039;dna7.fsa&#039;&#039; file (which contains insulin-like genes) and saves the entry selected by the number in &#039;&#039;selected.fsa&#039;&#039; and the rest of the entries into the file &#039;&#039;rest.fsa&#039;&#039; This should be a fairly easy task since you have your &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&#039;&#039;&#039;&lt;/ins&gt;fastaread&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&#039;&#039;&#039; &lt;/ins&gt;and &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&#039;&#039;&#039;&lt;/ins&gt;fastawrite&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&#039;&#039;&#039; &lt;/ins&gt;functions from lesson 8, exercise 5 and 6.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Now that you can select a fasta entry, then read the &amp;#039;&amp;#039;selected.fsa&amp;#039;&amp;#039; and create 5-mers from the sequence. Now read the entries from &amp;#039;&amp;#039;rest.fsa&amp;#039;&amp;#039; and for every entry create the 5-mers from the sequence. Report which sequence in &amp;#039;&amp;#039;rest.fsa&amp;#039;&amp;#039; had the greatest overlap (and how much overlap) with the selected sequence. This must be the sequence that looks most like the selected one.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Now that you can select a fasta entry, then read the &amp;#039;&amp;#039;selected.fsa&amp;#039;&amp;#039; and create 5-mers from the sequence. Now read the entries from &amp;#039;&amp;#039;rest.fsa&amp;#039;&amp;#039; and for every entry create the 5-mers from the sequence. Report which sequence in &amp;#039;&amp;#039;rest.fsa&amp;#039;&amp;#039; had the greatest overlap (and how much overlap) with the selected sequence. This must be the sequence that looks most like the selected one.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Read the sequences in the file &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039;. Find out which and how many of the 64 codons are &amp;#039;&amp;#039;&amp;#039;not&amp;#039;&amp;#039;&amp;#039; used somewhere in the sequences. Print the unused codons.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Read the sequences in the file &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039;. Find out which and how many of the 64 codons are &amp;#039;&amp;#039;&amp;#039;not&amp;#039;&amp;#039;&amp;#039; used somewhere in the sequences. Print the unused codons.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=170&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=170&amp;oldid=prev"/>
		<updated>2025-08-31T19:58:21Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 21:58, 31 August 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l20&quot;&gt;Line 20:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 20:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Many modern bioinformatic algorithms utilizes &#039;&#039;&#039;k-mers&#039;&#039;&#039;. A k-mer is a piece of sequence k units long. The k in the k-mer usually changes size, i.e. length of the sequence depending on the algorithm. It can be both protein and DNA sequence. As an example I have this DNA sequence: GACTAC. It contains the following 4-mers: GACT, ACTA, CTAC. The first 4 exercises works with k-mers and sets and leads up to &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;aalgorithm &lt;/del&gt;for find the sequence in a group that looks most like a target sequence. Homologs, anyone?&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Many modern bioinformatic algorithms utilizes &#039;&#039;&#039;k-mers&#039;&#039;&#039;. A k-mer is a piece of sequence k units long. The k in the k-mer usually changes size, i.e. length of the sequence depending on the algorithm. It can be both protein and DNA sequence. As an example I have this DNA sequence: GACTAC. It contains the following 4-mers: GACT, ACTA, CTAC. The first 4 exercises works with k-mers and sets and leads up to &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;an algorithm &lt;/ins&gt;for find the sequence in a group that looks most like a target sequence. Homologs, anyone?&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# From a genbank file like &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; extract the DNA sequence (been there - done that). Now insert all 5-mers in the DNA sequence into a set. Compute the following numbers: how many did you insert in the set, how many are in the set, how many different 5-mers could there possibly be. Display. It seems like the two first numbers are the same, but this is not guaranteed to be true. Explain why using your biological background.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# From a genbank file like &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; extract the DNA sequence (been there - done that). Now insert all 5-mers in the DNA sequence into a set. Compute the following numbers: how many did you insert in the set, how many are in the set, how many different 5-mers could there possibly be. Display. It seems like the two first numbers are the same, but this is not guaranteed to be true. Explain why using your biological background.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Calculate the overlap of 5-mers between any two of the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files. Just ask for two filenames and calculate for these.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Calculate the overlap of 5-mers between any two of the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files. Just ask for two filenames and calculate for these.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=169&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=169&amp;oldid=prev"/>
		<updated>2025-08-31T19:56:34Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 21:56, 31 August 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l20&quot;&gt;Line 20:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 20:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Many modern bioinformatic algorithms utilizes &#039;&#039;&#039;k-mers&#039;&#039;&#039;. A k-mer is a piece of sequence k units long. The k in the k-mer usually changes size, i.e. length of the sequence depending on the algorithm. It can be both protein and DNA sequence. As an example I have this DNA sequence: GACTAC. It contains the following 4-mers: GACT, ACTA, CTAC.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Many modern bioinformatic algorithms utilizes &#039;&#039;&#039;k-mers&#039;&#039;&#039;. A k-mer is a piece of sequence k units long. The k in the k-mer usually changes size, i.e. length of the sequence depending on the algorithm. It can be both protein and DNA sequence. As an example I have this DNA sequence: GACTAC. It contains the following 4-mers: GACT, ACTA, CTAC. &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;The first 4 exercises works with k-mers and sets and leads up to aalgorithm for find the sequence in a group that looks most like a target sequence. Homologs, anyone?&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# From a genbank file like &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; extract the DNA sequence (been there - done that). Now insert all 5-mers in the DNA sequence into a set. Compute the following numbers: how many did you insert in the set, how many are in the set, how many different 5-mers could there possibly be. Display. It seems like the two first numbers are the same, but this is not guaranteed to be true. Explain why using your biological background.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# From a genbank file like &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; extract the DNA sequence (been there - done that). Now insert all 5-mers in the DNA sequence into a set. Compute the following numbers: how many did you insert in the set, how many are in the set, how many different 5-mers could there possibly be. Display. It seems like the two first numbers are the same, but this is not guaranteed to be true. Explain why using your biological background.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Calculate the overlap of 5-mers between any two of the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files. Just ask for two filenames and calculate for these.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Calculate the overlap of 5-mers between any two of the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files. Just ask for two filenames and calculate for these.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=168&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=168&amp;oldid=prev"/>
		<updated>2025-08-31T19:53:07Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 21:53, 31 August 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l22&quot;&gt;Line 22:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 22:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Many modern bioinformatic algorithms utilizes &amp;#039;&amp;#039;&amp;#039;k-mers&amp;#039;&amp;#039;&amp;#039;. A k-mer is a piece of sequence k units long. The k in the k-mer usually changes size, i.e. length of the sequence depending on the algorithm. It can be both protein and DNA sequence. As an example I have this DNA sequence: GACTAC. It contains the following 4-mers: GACT, ACTA, CTAC.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Many modern bioinformatic algorithms utilizes &amp;#039;&amp;#039;&amp;#039;k-mers&amp;#039;&amp;#039;&amp;#039;. A k-mer is a piece of sequence k units long. The k in the k-mer usually changes size, i.e. length of the sequence depending on the algorithm. It can be both protein and DNA sequence. As an example I have this DNA sequence: GACTAC. It contains the following 4-mers: GACT, ACTA, CTAC.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# From a genbank file like &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; extract the DNA sequence (been there - done that). Now insert all 5-mers in the DNA sequence into a set. Compute the following numbers: how many did you insert in the set, how many are in the set, how many different 5-mers could there possibly be. Display. It seems like the two first numbers are the same, but this is not guaranteed to be true. Explain why using your biological background.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# From a genbank file like &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; extract the DNA sequence (been there - done that). Now insert all 5-mers in the DNA sequence into a set. Compute the following numbers: how many did you insert in the set, how many are in the set, how many different 5-mers could there possibly be. Display. It seems like the two first numbers are the same, but this is not guaranteed to be true. Explain why using your biological background.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Calculate the overlap of 5-mers between any two of the &#039;&#039;data1-4.gb&#039;&#039; files. Just ask for two filenames and calculate for these.# Make a program that asks for a number (integer) then reads the &#039;&#039;dna7.fsa&#039;&#039; file (which contains insulin-like genes) and saves the entry selected by the number in &#039;&#039;selected.fsa&#039;&#039; and the rest of the entries into the file &#039;&#039;rest.fsa&#039;&#039;.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Calculate the overlap of 5-mers between any two of the &#039;&#039;data1-4.gb&#039;&#039; files. Just ask for two filenames and calculate for these.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that asks for a number (integer) then reads the &#039;&#039;dna7.fsa&#039;&#039; file (which contains insulin-like genes) and saves the entry selected by the number in &#039;&#039;selected.fsa&#039;&#039; and the rest of the entries into the file &#039;&#039;rest.fsa&#039;&#039; &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;This should be a fairly easy task since you have your fastaread and fastawrite functions from lesson 8, exercise 5 and 6&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Now that you can select a fasta entry, then read the &amp;#039;&amp;#039;selected.fsa&amp;#039;&amp;#039; and create 5-mers from the sequence. Now read the entries from &amp;#039;&amp;#039;rest.fsa&amp;#039;&amp;#039; and for every entry create the 5-mers from the sequence. Report which sequence in &amp;#039;&amp;#039;rest.fsa&amp;#039;&amp;#039; had the greatest overlap (and how much overlap) with the selected sequence. This must be the sequence that looks most like the selected one.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Now that you can select a fasta entry, then read the &amp;#039;&amp;#039;selected.fsa&amp;#039;&amp;#039; and create 5-mers from the sequence. Now read the entries from &amp;#039;&amp;#039;rest.fsa&amp;#039;&amp;#039; and for every entry create the 5-mers from the sequence. Report which sequence in &amp;#039;&amp;#039;rest.fsa&amp;#039;&amp;#039; had the greatest overlap (and how much overlap) with the selected sequence. This must be the sequence that looks most like the selected one.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Read the sequences in the file &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039;. Find out which and how many of the 64 codons are &amp;#039;&amp;#039;&amp;#039;not&amp;#039;&amp;#039;&amp;#039; used somewhere in the sequences. Print the unused codons.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Read the sequences in the file &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039;. Find out which and how many of the 64 codons are &amp;#039;&amp;#039;&amp;#039;not&amp;#039;&amp;#039;&amp;#039; used somewhere in the sequences. Print the unused codons.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=167&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Set_techniques&amp;diff=167&amp;oldid=prev"/>
		<updated>2025-08-31T19:49:28Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 21:49, 31 August 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l21&quot;&gt;Line 21:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 21:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Many modern bioinformatic algorithms utilizes &amp;#039;&amp;#039;&amp;#039;k-mers&amp;#039;&amp;#039;&amp;#039;. A k-mer is a piece of sequence k units long. The k in the k-mer usually changes size, i.e. length of the sequence depending on the algorithm. It can be both protein and DNA sequence. As an example I have this DNA sequence: GACTAC. It contains the following 4-mers: GACT, ACTA, CTAC.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Many modern bioinformatic algorithms utilizes &amp;#039;&amp;#039;&amp;#039;k-mers&amp;#039;&amp;#039;&amp;#039;. A k-mer is a piece of sequence k units long. The k in the k-mer usually changes size, i.e. length of the sequence depending on the algorithm. It can be both protein and DNA sequence. As an example I have this DNA sequence: GACTAC. It contains the following 4-mers: GACT, ACTA, CTAC.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# From a genbank file like &#039;&#039;data1-4.gb&#039;&#039; extract the DNA sequence (been there - done that). Now insert all 5-mers in the DNA sequence into a set. Compute the following numbers: how many did you insert in the set, how many are in the set, how many different 5-mers could there possibly be. Display. It seems like the two first numbers are the same, but this is not guaranteed to be true&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;, &lt;/del&gt;why&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;?&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# From a genbank file like &#039;&#039;data1-4.gb&#039;&#039; extract the DNA sequence (been there - done that). Now insert all 5-mers in the DNA sequence into a set. Compute the following numbers: how many did you insert in the set, how many are in the set, how many different 5-mers could there possibly be. Display. It seems like the two first numbers are the same, but this is not guaranteed to be true&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;. Explain &lt;/ins&gt;why &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;using your biological background.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Calculate the overlap of 5-mers between any two of the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files. Just ask for two filenames and calculate for these.# Make a program that asks for a number (integer) then reads the &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; file (which contains insulin-like genes) and saves the entry selected by the number in &amp;#039;&amp;#039;selected.fsa&amp;#039;&amp;#039; and the rest of the entries into the file &amp;#039;&amp;#039;rest.fsa&amp;#039;&amp;#039;.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Calculate the overlap of 5-mers between any two of the &amp;#039;&amp;#039;data1-4.gb&amp;#039;&amp;#039; files. Just ask for two filenames and calculate for these.# Make a program that asks for a number (integer) then reads the &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; file (which contains insulin-like genes) and saves the entry selected by the number in &amp;#039;&amp;#039;selected.fsa&amp;#039;&amp;#039; and the rest of the entries into the file &amp;#039;&amp;#039;rest.fsa&amp;#039;&amp;#039;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Now that you can select a fasta entry, then read the &amp;#039;&amp;#039;selected.fsa&amp;#039;&amp;#039; and create 5-mers from the sequence. Now read the entries from &amp;#039;&amp;#039;rest.fsa&amp;#039;&amp;#039; and for every entry create the 5-mers from the sequence. Report which sequence in &amp;#039;&amp;#039;rest.fsa&amp;#039;&amp;#039; had the greatest overlap (and how much overlap) with the selected sequence. This must be the sequence that looks most like the selected one.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Now that you can select a fasta entry, then read the &amp;#039;&amp;#039;selected.fsa&amp;#039;&amp;#039; and create 5-mers from the sequence. Now read the entries from &amp;#039;&amp;#039;rest.fsa&amp;#039;&amp;#039; and for every entry create the 5-mers from the sequence. Report which sequence in &amp;#039;&amp;#039;rest.fsa&amp;#039;&amp;#039; had the greatest overlap (and how much overlap) with the selected sequence. This must be the sequence that looks most like the selected one.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
</feed>