<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?action=history&amp;feed=atom&amp;title=Regular_expressions</id>
	<title>Regular expressions - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?action=history&amp;feed=atom&amp;title=Regular_expressions"/>
	<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;action=history"/>
	<updated>2026-05-01T19:03:24Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=268&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=268&amp;oldid=prev"/>
		<updated>2025-10-03T13:59:33Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 15:59, 3 October 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l19&quot;&gt;Line 19:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 19:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Exercise 6 to 8 has strong taste of something I would do at an exam. It is also an interesting beginning of the making a HIV vaccine. The data is real and the methods are real.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Exercise 6 to 8 has strong taste of something I would do at an exam. It is also an interesting beginning of the making a HIV vaccine. The data is real and the methods are real.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that accepts a string as input from the keyboard. Use regular expressions (RE) to determine if the input is a number. The goal is to do this with a SINGLE regex.&amp;lt;br&amp;gt; These should all be considered as numbers: &quot;4&quot;   &quot;-7&quot;   &quot;0.656&quot;   &quot;-67.35555&quot;&amp;lt;br&amp;gt; These are not numbers: &quot;5.&quot;  &quot;56F&quot;  &quot;.32&quot;  &quot;-.04&quot;  &quot;1+1&quot;&amp;lt;br&amp;gt; Note: The program is very simple, but it is likely the most difficult regular expression, you will have to make in this set of exercises. Perhaps you should do the following exercises before attempting this one - just to get some experience first.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt; &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Regex is often used for validation. This time make a small program that ask for an email address, and checks if it is in a proper format. Quite similar to the previous exercise, just a different thing to validate.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that accepts a string as input from the keyboard. Use regular expressions (RE) to determine if the input is a number. The goal is to do this with a SINGLE regex.&amp;lt;br&amp;gt; These should all be considered as numbers: &quot;4&quot;   &quot;-7&quot;   &quot;0.656&quot;   &quot;-67.35555&quot;&amp;lt;br&amp;gt; These are not numbers: &quot;5.&quot;  &quot;56F&quot;  &quot;.32&quot;  &quot;-.04&quot;  &quot;1+1&quot;&amp;lt;br&amp;gt; Note: The program is very simple, but it is likely the most difficult regular expression, you will have to make in this set of exercises. Perhaps you should do the following exercises before attempting this one - just to get some experience first.&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that can read and verify a fasta file. Use your previous function &#039;&#039;&#039;fastaread()&#039;&#039;&#039; Test the program with &#039;&#039;dna7.fsa&#039;&#039; and &#039;&#039;dnanoise.fsa&#039;&#039;. Verification here means that the program prints &quot;DNA fasta&quot; or &quot;Protein fasta&quot; if the file is successfully verified for either dna or protein sequence, and &quot;Not fasta&quot; if unsuccessfully verified. You can find a description of fasta format in [[Biological knowledge needed in the course]]. You are expected to know which symbols are used for DNA and protein sequence - or that you are able to look it up.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Regex is often used for validation. This time make a small program that ask for an email address, and checks if it is in a proper format. Quite similar to the previous exercise, just a different thing to validate.&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Building on your experience with the previous exercise, make a program that reads a fasta file, discard entries that can not conform to DNA or protein sequence, and rewrite (using your &#039;&#039;&#039;fastawrite()&#039;&#039;&#039; function) the acceptable entries in the output file &#039;&#039;fastaout.fsa&#039;&#039;, in such a way that the normal 60 chars per line is followed with no spaces in between. The program must inform the user how many entries was kept and how many discarded. Hint: Test on &#039;&#039;dnanoise.fsa&#039;&#039;, which contains 3 entries that should be discarded (V00179, J00265, J02989).&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that can read and verify a fasta file. Use your previous function &#039;&#039;&#039;fastaread()&#039;&#039;&#039; Test the program with &#039;&#039;dna7.fsa&#039;&#039; and &#039;&#039;dnanoise.fsa&#039;&#039;. Verification here means that the program prints &quot;DNA fasta&quot; or &quot;Protein fasta&quot; if the file is successfully verified for either dna or protein sequence, and &quot;Not fasta&quot; if unsuccessfully verified. You can find a description of fasta format in [[Biological knowledge needed in the course]]. You are expected to know which symbols are used for DNA and protein sequence - or that you are able to look it up.&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# [[File:consensus.png|50px|right]] In the file &#039;&#039;alignment.fsa&#039;&#039; is a protein alignment of part of the insulin gene from different organisms. Read the fasta file and determine the consensus sequence of the alignment, which you print. The consensus sequence is simply the most frequent occurring amino acid on each position.&amp;lt;br&amp;gt;Consensus: MALWMRLLPLLALLALWEPDPAGAFVNGHLCGSHLVEALYLVCGERGFFYTPKSRREVEDPGVGGLELGGGP&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Building on your experience with the previous exercise, make a program that reads a fasta file, discard entries that can not conform to DNA or protein sequence, and rewrite (using your &#039;&#039;&#039;fastawrite()&#039;&#039;&#039; function) the acceptable entries in the output file &#039;&#039;fastaout.fsa&#039;&#039;, in such a way that the normal 60 chars per line is followed with no spaces in between. The program must inform the user how many entries was kept and how many discarded. Hint: Test on &#039;&#039;dnanoise.fsa&#039;&#039;, which contains 3 entries that should be discarded (V00179, J00265, J02989).&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# All HIV envelope proteins from various HIV strains in SwissProt have been identified and collected in the file &#039;&#039;HIVenvelope.txt&#039;&#039;. Using regular repressions you must extract the ID and the protein sequence from each entry and save them in a fasta file named &#039;&#039;HIVenv.fsa&#039;&#039;. This job is not new to you and you can use your &#039;&#039;&#039;fastawrite()&#039;&#039;&#039; function if you want to.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# [[File:consensus.png|50px|right]] In the file &#039;&#039;alignment.fsa&#039;&#039; is a protein alignment of part of the insulin gene from different organisms. Read the fasta file and determine the consensus sequence of the alignment, which you print. The consensus sequence is simply the most frequent occurring amino acid on each position.&amp;lt;br&amp;gt;Consensus: MALWMRLLPLLALLALWEPDPAGAFVNGHLCGSHLVEALYLVCGERGFFYTPKSRREVEDPGVGGLELGGGP&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Continuing the investigation in HIV. Read the &#039;&#039;HIVenv.fsa&#039;&#039; fasta file and create a single set consisting of all possible epitopes for all sequences. Save the epitopes in the file &#039;&#039;HIVepitopes.txt&#039;&#039; - one epitope per line. An epitope is simply a k-mer 9 residues long, which can possibly elicit a immune system response. So save all unique 9-mers in the sequences in the file. To generate a file that you can verify against my file, you must sort the epitopes alphabetically before saving them.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# All HIV envelope proteins from various HIV strains in SwissProt have been identified and collected in the file &#039;&#039;HIVenvelope.txt&#039;&#039;. Using regular repressions you must extract the ID and the protein sequence from each entry and save them in a fasta file named &#039;&#039;HIVenv.fsa&#039;&#039;. This job is not new to you and you can use your &#039;&#039;&#039;fastawrite()&#039;&#039;&#039; function if you want to.&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Continuing the investigation in HIV. Read the &#039;&#039;HIVenv.fsa&#039;&#039; fasta file and create a single set consisting of all possible epitopes for all sequences. Save the epitopes in the file &#039;&#039;HIVepitopes.txt&#039;&#039; - one epitope per line. An epitope is simply a k-mer 9 residues long, which can possibly elicit a immune system response. So save all unique 9-mers in the sequences in the file. To generate a file that you can verify against my file, you must sort the epitopes alphabetically before saving them.&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# You must prepare the epitopes in &amp;#039;&amp;#039;HIVepitopes.txt&amp;#039;&amp;#039; for machine learning. A common ML problem is when you have many data points (here epitopes) that look like each other. This introduces an unwanted bias in the ML predictions. Your job is to eliminate epitopes that are too similar using the Hobohm-1 algorithm. Hobohm-1 works like this: Look a the first epitope in the list. Compare it sequentially with the rest of the epitopes. If an epitope is too similar with the first, throw it away. Now no epitopes looks like the first. Proceed to the second epitope and repeat the comparing and possibly throwing away of the subsequent epitopes. Proceed to the third epitope in the list. Repeat this pattern until you have reached the end of the list. Now all epitope left are dissimilar to each other.&amp;lt;br&amp;gt;How to determine if two epitopes are too similar? Easy - you earlier learned about the Hamming distance. Just compute the Hamming distance between two epitopes and if the distance is 3 or less, they are too similar. Save the &amp;quot;surviving&amp;quot; epitopes in the file &amp;#039;&amp;#039;HIVepitopesML.txt&amp;#039;&amp;#039;.&amp;lt;br&amp;gt;Hint: Think about the Hobohm-1 algorithm before you implement it. You can easily run into problems.&amp;lt;br&amp;gt;You should see a reduction from 15909 epitopes to 3842 in 30-60 seconds.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# You must prepare the epitopes in &amp;#039;&amp;#039;HIVepitopes.txt&amp;#039;&amp;#039; for machine learning. A common ML problem is when you have many data points (here epitopes) that look like each other. This introduces an unwanted bias in the ML predictions. Your job is to eliminate epitopes that are too similar using the Hobohm-1 algorithm. Hobohm-1 works like this: Look a the first epitope in the list. Compare it sequentially with the rest of the epitopes. If an epitope is too similar with the first, throw it away. Now no epitopes looks like the first. Proceed to the second epitope and repeat the comparing and possibly throwing away of the subsequent epitopes. Proceed to the third epitope in the list. Repeat this pattern until you have reached the end of the list. Now all epitope left are dissimilar to each other.&amp;lt;br&amp;gt;How to determine if two epitopes are too similar? Easy - you earlier learned about the Hamming distance. Just compute the Hamming distance between two epitopes and if the distance is 3 or less, they are too similar. Save the &amp;quot;surviving&amp;quot; epitopes in the file &amp;#039;&amp;#039;HIVepitopesML.txt&amp;#039;&amp;#039;.&amp;lt;br&amp;gt;Hint: Think about the Hobohm-1 algorithm before you implement it. You can easily run into problems.&amp;lt;br&amp;gt;You should see a reduction from 15909 epitopes to 3842 in 30-60 seconds.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=231&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=231&amp;oldid=prev"/>
		<updated>2025-09-06T19:08:57Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 21:08, 6 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l26&quot;&gt;Line 26:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 26:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# All HIV envelope proteins from various HIV strains in SwissProt have been identified and collected in the file &amp;#039;&amp;#039;HIVenvelope.txt&amp;#039;&amp;#039;. Using regular repressions you must extract the ID and the protein sequence from each entry and save them in a fasta file named &amp;#039;&amp;#039;HIVenv.fsa&amp;#039;&amp;#039;. This job is not new to you and you can use your &amp;#039;&amp;#039;&amp;#039;fastawrite()&amp;#039;&amp;#039;&amp;#039; function if you want to.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# All HIV envelope proteins from various HIV strains in SwissProt have been identified and collected in the file &amp;#039;&amp;#039;HIVenvelope.txt&amp;#039;&amp;#039;. Using regular repressions you must extract the ID and the protein sequence from each entry and save them in a fasta file named &amp;#039;&amp;#039;HIVenv.fsa&amp;#039;&amp;#039;. This job is not new to you and you can use your &amp;#039;&amp;#039;&amp;#039;fastawrite()&amp;#039;&amp;#039;&amp;#039; function if you want to.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Continuing the investigation in HIV. Read the &amp;#039;&amp;#039;HIVenv.fsa&amp;#039;&amp;#039; fasta file and create a single set consisting of all possible epitopes for all sequences. Save the epitopes in the file &amp;#039;&amp;#039;HIVepitopes.txt&amp;#039;&amp;#039; - one epitope per line. An epitope is simply a k-mer 9 residues long, which can possibly elicit a immune system response. So save all unique 9-mers in the sequences in the file. To generate a file that you can verify against my file, you must sort the epitopes alphabetically before saving them.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Continuing the investigation in HIV. Read the &amp;#039;&amp;#039;HIVenv.fsa&amp;#039;&amp;#039; fasta file and create a single set consisting of all possible epitopes for all sequences. Save the epitopes in the file &amp;#039;&amp;#039;HIVepitopes.txt&amp;#039;&amp;#039; - one epitope per line. An epitope is simply a k-mer 9 residues long, which can possibly elicit a immune system response. So save all unique 9-mers in the sequences in the file. To generate a file that you can verify against my file, you must sort the epitopes alphabetically before saving them.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# You must prepare the epitopes in &#039;&#039;HIVepitopes.txt&#039;&#039; for machine learning. A common ML problem is when you have many data points (here epitopes) that look like each other. This introduces an unwanted bias in the ML predictions. Your job is to eliminate epitopes that are too similar using the Hobohm-1 algorithm. Hobohm-1 works like this: Look a the first epitope in the list. Compare it sequentially with the rest of the epitopes. If an epitope is too similar with the first, throw it away. Now no epitopes looks like the first. Proceed to the second epitope and repeat the comparing and possibly throwing away of the subsequent epitopes. Proceed to the third epitope in the list. Repeat this pattern until you have reached the end of the list. Now all epitope left are dissimilar to each other.&amp;lt;br&amp;gt;How to determine if two epitopes are too similar? Easy - you earlier learned about the Hamming distance. Just compute the Hamming distance between two epitopes and if the distance is 3 or less, they are too similar. Save the &quot;surviving&quot; epitopes in the file &#039;&#039;HIVepitopesML.txt&#039;&#039;.&amp;lt;br&amp;gt;Hint: Think about the Hobohm-1 algorithm before you implement it. You can easily run into problems.&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# You must prepare the epitopes in &#039;&#039;HIVepitopes.txt&#039;&#039; for machine learning. A common ML problem is when you have many data points (here epitopes) that look like each other. This introduces an unwanted bias in the ML predictions. Your job is to eliminate epitopes that are too similar using the Hobohm-1 algorithm. Hobohm-1 works like this: Look a the first epitope in the list. Compare it sequentially with the rest of the epitopes. If an epitope is too similar with the first, throw it away. Now no epitopes looks like the first. Proceed to the second epitope and repeat the comparing and possibly throwing away of the subsequent epitopes. Proceed to the third epitope in the list. Repeat this pattern until you have reached the end of the list. Now all epitope left are dissimilar to each other.&amp;lt;br&amp;gt;How to determine if two epitopes are too similar? Easy - you earlier learned about the Hamming distance. Just compute the Hamming distance between two epitopes and if the distance is 3 or less, they are too similar. Save the &quot;surviving&quot; epitopes in the file &#039;&#039;HIVepitopesML.txt&#039;&#039;.&amp;lt;br&amp;gt;Hint: Think about the Hobohm-1 algorithm before you implement it. You can easily run into problems.&amp;lt;br&amp;gt;You should see a reduction from 15909 epitopes to 3842 in 30-60 seconds.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;You should see a reduction from 15909 epitopes to 3842 in 30-60 seconds.&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=230&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=230&amp;oldid=prev"/>
		<updated>2025-09-06T19:08:41Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 21:08, 6 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l26&quot;&gt;Line 26:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 26:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# All HIV envelope proteins from various HIV strains in SwissProt have been identified and collected in the file &amp;#039;&amp;#039;HIVenvelope.txt&amp;#039;&amp;#039;. Using regular repressions you must extract the ID and the protein sequence from each entry and save them in a fasta file named &amp;#039;&amp;#039;HIVenv.fsa&amp;#039;&amp;#039;. This job is not new to you and you can use your &amp;#039;&amp;#039;&amp;#039;fastawrite()&amp;#039;&amp;#039;&amp;#039; function if you want to.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# All HIV envelope proteins from various HIV strains in SwissProt have been identified and collected in the file &amp;#039;&amp;#039;HIVenvelope.txt&amp;#039;&amp;#039;. Using regular repressions you must extract the ID and the protein sequence from each entry and save them in a fasta file named &amp;#039;&amp;#039;HIVenv.fsa&amp;#039;&amp;#039;. This job is not new to you and you can use your &amp;#039;&amp;#039;&amp;#039;fastawrite()&amp;#039;&amp;#039;&amp;#039; function if you want to.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Continuing the investigation in HIV. Read the &amp;#039;&amp;#039;HIVenv.fsa&amp;#039;&amp;#039; fasta file and create a single set consisting of all possible epitopes for all sequences. Save the epitopes in the file &amp;#039;&amp;#039;HIVepitopes.txt&amp;#039;&amp;#039; - one epitope per line. An epitope is simply a k-mer 9 residues long, which can possibly elicit a immune system response. So save all unique 9-mers in the sequences in the file. To generate a file that you can verify against my file, you must sort the epitopes alphabetically before saving them.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Continuing the investigation in HIV. Read the &amp;#039;&amp;#039;HIVenv.fsa&amp;#039;&amp;#039; fasta file and create a single set consisting of all possible epitopes for all sequences. Save the epitopes in the file &amp;#039;&amp;#039;HIVepitopes.txt&amp;#039;&amp;#039; - one epitope per line. An epitope is simply a k-mer 9 residues long, which can possibly elicit a immune system response. So save all unique 9-mers in the sequences in the file. To generate a file that you can verify against my file, you must sort the epitopes alphabetically before saving them.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# You must prepare the epitopes in &#039;&#039;HIVepitopes.txt&#039;&#039; for machine learning. A common ML problem is when you have many data points (here epitopes) that look like each other. This introduces an unwanted bias in the ML predictions. Your job is to eliminate epitopes that are too similar using the Hobohm-1 algorithm. Hobohm-1 works like this: Look a the first epitope in the list. Compare it sequentially with the rest of the epitopes. If an epitope is too similar with the first, throw it away. Now no epitopes looks like the first. Proceed to the second epitope and repeat the comparing and possibly throwing away of the subsequent epitopes. Proceed to the third epitope in the list. Repeat this pattern until you have reached the end of the list. Now all epitope left are dissimilar to each other.&amp;lt;br&amp;gt;How to determine if two epitopes are too similar? Easy - you earlier learned about the Hamming distance. Just compute the Hamming distance between two epitopes and if the distance is 3 or less, they are too similar. Save the &quot;surviving&quot; epitopes in the file &#039;&#039;HIVepitopesML.txt&#039;&#039;.&amp;lt;br&amp;gt;Hint: Think about the Hobohm-1 algorithm before you implement it. You can easily run into problems.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# You must prepare the epitopes in &#039;&#039;HIVepitopes.txt&#039;&#039; for machine learning. A common ML problem is when you have many data points (here epitopes) that look like each other. This introduces an unwanted bias in the ML predictions. Your job is to eliminate epitopes that are too similar using the Hobohm-1 algorithm. Hobohm-1 works like this: Look a the first epitope in the list. Compare it sequentially with the rest of the epitopes. If an epitope is too similar with the first, throw it away. Now no epitopes looks like the first. Proceed to the second epitope and repeat the comparing and possibly throwing away of the subsequent epitopes. Proceed to the third epitope in the list. Repeat this pattern until you have reached the end of the list. Now all epitope left are dissimilar to each other.&amp;lt;br&amp;gt;How to determine if two epitopes are too similar? Easy - you earlier learned about the Hamming distance. Just compute the Hamming distance between two epitopes and if the distance is 3 or less, they are too similar. Save the &quot;surviving&quot; epitopes in the file &#039;&#039;HIVepitopesML.txt&#039;&#039;.&amp;lt;br&amp;gt;Hint: Think about the Hobohm-1 algorithm before you implement it. You can easily run into problems&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;.&amp;lt;br&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;You should see a reduction from 15909 epitopes to 3842 in 30-60 seconds&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=229&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=229&amp;oldid=prev"/>
		<updated>2025-09-06T18:06:19Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 20:06, 6 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l23&quot;&gt;Line 23:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 23:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that can read and verify a fasta file. Use your previous function &amp;#039;&amp;#039;&amp;#039;fastaread()&amp;#039;&amp;#039;&amp;#039; Test the program with &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; and &amp;#039;&amp;#039;dnanoise.fsa&amp;#039;&amp;#039;. Verification here means that the program prints &amp;quot;DNA fasta&amp;quot; or &amp;quot;Protein fasta&amp;quot; if the file is successfully verified for either dna or protein sequence, and &amp;quot;Not fasta&amp;quot; if unsuccessfully verified. You can find a description of fasta format in [[Biological knowledge needed in the course]]. You are expected to know which symbols are used for DNA and protein sequence - or that you are able to look it up.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that can read and verify a fasta file. Use your previous function &amp;#039;&amp;#039;&amp;#039;fastaread()&amp;#039;&amp;#039;&amp;#039; Test the program with &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; and &amp;#039;&amp;#039;dnanoise.fsa&amp;#039;&amp;#039;. Verification here means that the program prints &amp;quot;DNA fasta&amp;quot; or &amp;quot;Protein fasta&amp;quot; if the file is successfully verified for either dna or protein sequence, and &amp;quot;Not fasta&amp;quot; if unsuccessfully verified. You can find a description of fasta format in [[Biological knowledge needed in the course]]. You are expected to know which symbols are used for DNA and protein sequence - or that you are able to look it up.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Building on your experience with the previous exercise, make a program that reads a fasta file, discard entries that can not conform to DNA or protein sequence, and rewrite (using your &amp;#039;&amp;#039;&amp;#039;fastawrite()&amp;#039;&amp;#039;&amp;#039; function) the acceptable entries in the output file &amp;#039;&amp;#039;fastaout.fsa&amp;#039;&amp;#039;, in such a way that the normal 60 chars per line is followed with no spaces in between. The program must inform the user how many entries was kept and how many discarded. Hint: Test on &amp;#039;&amp;#039;dnanoise.fsa&amp;#039;&amp;#039;, which contains 3 entries that should be discarded (V00179, J00265, J02989).&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Building on your experience with the previous exercise, make a program that reads a fasta file, discard entries that can not conform to DNA or protein sequence, and rewrite (using your &amp;#039;&amp;#039;&amp;#039;fastawrite()&amp;#039;&amp;#039;&amp;#039; function) the acceptable entries in the output file &amp;#039;&amp;#039;fastaout.fsa&amp;#039;&amp;#039;, in such a way that the normal 60 chars per line is followed with no spaces in between. The program must inform the user how many entries was kept and how many discarded. Hint: Test on &amp;#039;&amp;#039;dnanoise.fsa&amp;#039;&amp;#039;, which contains 3 entries that should be discarded (V00179, J00265, J02989).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# [[File:consensus.png|50px|right]] In the file &#039;&#039;alignment.fsa&#039;&#039; is a protein alignment of part of the insulin gene from different organisms. Read the fasta file and determine the consensus sequence of the alignment, which you print. The consensus sequence is simply the most frequent occurring amino acid on each position.&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# [[File:consensus.png|50px|right]] In the file &#039;&#039;alignment.fsa&#039;&#039; is a protein alignment of part of the insulin gene from different organisms. Read the fasta file and determine the consensus sequence of the alignment, which you print. The consensus sequence is simply the most frequent occurring amino acid on each position.&amp;lt;br&amp;gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Consensus: MALWMRLLPLLALLALWEPDPAGAFVNGHLCGSHLVEALYLVCGERGFFYTPKSRREVEDPGVGGLELGGGP&lt;/ins&gt;&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# All HIV envelope proteins from various HIV strains in SwissProt have been identified and collected in the file &amp;#039;&amp;#039;HIVenvelope.txt&amp;#039;&amp;#039;. Using regular repressions you must extract the ID and the protein sequence from each entry and save them in a fasta file named &amp;#039;&amp;#039;HIVenv.fsa&amp;#039;&amp;#039;. This job is not new to you and you can use your &amp;#039;&amp;#039;&amp;#039;fastawrite()&amp;#039;&amp;#039;&amp;#039; function if you want to.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# All HIV envelope proteins from various HIV strains in SwissProt have been identified and collected in the file &amp;#039;&amp;#039;HIVenvelope.txt&amp;#039;&amp;#039;. Using regular repressions you must extract the ID and the protein sequence from each entry and save them in a fasta file named &amp;#039;&amp;#039;HIVenv.fsa&amp;#039;&amp;#039;. This job is not new to you and you can use your &amp;#039;&amp;#039;&amp;#039;fastawrite()&amp;#039;&amp;#039;&amp;#039; function if you want to.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Continuing the investigation in HIV. Read the &amp;#039;&amp;#039;HIVenv.fsa&amp;#039;&amp;#039; fasta file and create a single set consisting of all possible epitopes for all sequences. Save the epitopes in the file &amp;#039;&amp;#039;HIVepitopes.txt&amp;#039;&amp;#039; - one epitope per line. An epitope is simply a k-mer 9 residues long, which can possibly elicit a immune system response. So save all unique 9-mers in the sequences in the file. To generate a file that you can verify against my file, you must sort the epitopes alphabetically before saving them.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Continuing the investigation in HIV. Read the &amp;#039;&amp;#039;HIVenv.fsa&amp;#039;&amp;#039; fasta file and create a single set consisting of all possible epitopes for all sequences. Save the epitopes in the file &amp;#039;&amp;#039;HIVepitopes.txt&amp;#039;&amp;#039; - one epitope per line. An epitope is simply a k-mer 9 residues long, which can possibly elicit a immune system response. So save all unique 9-mers in the sequences in the file. To generate a file that you can verify against my file, you must sort the epitopes alphabetically before saving them.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=228&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=228&amp;oldid=prev"/>
		<updated>2025-09-06T17:46:36Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 19:46, 6 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l22&quot;&gt;Line 22:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 22:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Regex is often used for validation. This time make a small program that ask for an email address, and checks if it is in a proper format. Quite similar to the previous exercise, just a different thing to validate.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Regex is often used for validation. This time make a small program that ask for an email address, and checks if it is in a proper format. Quite similar to the previous exercise, just a different thing to validate.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that can read and verify a fasta file. Use your previous function &amp;#039;&amp;#039;&amp;#039;fastaread()&amp;#039;&amp;#039;&amp;#039; Test the program with &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; and &amp;#039;&amp;#039;dnanoise.fsa&amp;#039;&amp;#039;. Verification here means that the program prints &amp;quot;DNA fasta&amp;quot; or &amp;quot;Protein fasta&amp;quot; if the file is successfully verified for either dna or protein sequence, and &amp;quot;Not fasta&amp;quot; if unsuccessfully verified. You can find a description of fasta format in [[Biological knowledge needed in the course]]. You are expected to know which symbols are used for DNA and protein sequence - or that you are able to look it up.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that can read and verify a fasta file. Use your previous function &amp;#039;&amp;#039;&amp;#039;fastaread()&amp;#039;&amp;#039;&amp;#039; Test the program with &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; and &amp;#039;&amp;#039;dnanoise.fsa&amp;#039;&amp;#039;. Verification here means that the program prints &amp;quot;DNA fasta&amp;quot; or &amp;quot;Protein fasta&amp;quot; if the file is successfully verified for either dna or protein sequence, and &amp;quot;Not fasta&amp;quot; if unsuccessfully verified. You can find a description of fasta format in [[Biological knowledge needed in the course]]. You are expected to know which symbols are used for DNA and protein sequence - or that you are able to look it up.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Building on your experience with the previous exercise, make a program that reads a fasta file, discard entries that can not conform to DNA or protein sequence, and rewrite (using your &#039;&#039;&#039;fastawrite()&#039;&#039;&#039; function) the acceptable entries in the output file &#039;&#039;fastaout.fsa&#039;&#039;, in such a way that the normal 60 chars per line is followed with no spaces in between. The program must inform the user how many entries was kept and how many discarded. Hint: Test on &#039;&#039;dnanoise.fsa&#039;&#039;, which contains 3 entries that should be discarded.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Building on your experience with the previous exercise, make a program that reads a fasta file, discard entries that can not conform to DNA or protein sequence, and rewrite (using your &#039;&#039;&#039;fastawrite()&#039;&#039;&#039; function) the acceptable entries in the output file &#039;&#039;fastaout.fsa&#039;&#039;, in such a way that the normal 60 chars per line is followed with no spaces in between. The program must inform the user how many entries was kept and how many discarded. Hint: Test on &#039;&#039;dnanoise.fsa&#039;&#039;, which contains 3 entries that should be discarded &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;(V00179, J00265, J02989)&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# [[File:consensus.png|50px|right]] In the file &amp;#039;&amp;#039;alignment.fsa&amp;#039;&amp;#039; is a protein alignment of part of the insulin gene from different organisms. Read the fasta file and determine the consensus sequence of the alignment, which you print. The consensus sequence is simply the most frequent occurring amino acid on each position.&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# [[File:consensus.png|50px|right]] In the file &amp;#039;&amp;#039;alignment.fsa&amp;#039;&amp;#039; is a protein alignment of part of the insulin gene from different organisms. Read the fasta file and determine the consensus sequence of the alignment, which you print. The consensus sequence is simply the most frequent occurring amino acid on each position.&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# All HIV envelope proteins from various HIV strains in SwissProt have been identified and collected in the file &amp;#039;&amp;#039;HIVenvelope.txt&amp;#039;&amp;#039;. Using regular repressions you must extract the ID and the protein sequence from each entry and save them in a fasta file named &amp;#039;&amp;#039;HIVenv.fsa&amp;#039;&amp;#039;. This job is not new to you and you can use your &amp;#039;&amp;#039;&amp;#039;fastawrite()&amp;#039;&amp;#039;&amp;#039; function if you want to.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# All HIV envelope proteins from various HIV strains in SwissProt have been identified and collected in the file &amp;#039;&amp;#039;HIVenvelope.txt&amp;#039;&amp;#039;. Using regular repressions you must extract the ID and the protein sequence from each entry and save them in a fasta file named &amp;#039;&amp;#039;HIVenv.fsa&amp;#039;&amp;#039;. This job is not new to you and you can use your &amp;#039;&amp;#039;&amp;#039;fastawrite()&amp;#039;&amp;#039;&amp;#039; function if you want to.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=227&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=227&amp;oldid=prev"/>
		<updated>2025-09-06T17:06:32Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 19:06, 6 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l22&quot;&gt;Line 22:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 22:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Regex is often used for validation. This time make a small program that ask for an email address, and checks if it is in a proper format. Quite similar to the previous exercise, just a different thing to validate.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Regex is often used for validation. This time make a small program that ask for an email address, and checks if it is in a proper format. Quite similar to the previous exercise, just a different thing to validate.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that can read and verify a fasta file. Use your previous function &amp;#039;&amp;#039;&amp;#039;fastaread()&amp;#039;&amp;#039;&amp;#039; Test the program with &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; and &amp;#039;&amp;#039;dnanoise.fsa&amp;#039;&amp;#039;. Verification here means that the program prints &amp;quot;DNA fasta&amp;quot; or &amp;quot;Protein fasta&amp;quot; if the file is successfully verified for either dna or protein sequence, and &amp;quot;Not fasta&amp;quot; if unsuccessfully verified. You can find a description of fasta format in [[Biological knowledge needed in the course]]. You are expected to know which symbols are used for DNA and protein sequence - or that you are able to look it up.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that can read and verify a fasta file. Use your previous function &amp;#039;&amp;#039;&amp;#039;fastaread()&amp;#039;&amp;#039;&amp;#039; Test the program with &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; and &amp;#039;&amp;#039;dnanoise.fsa&amp;#039;&amp;#039;. Verification here means that the program prints &amp;quot;DNA fasta&amp;quot; or &amp;quot;Protein fasta&amp;quot; if the file is successfully verified for either dna or protein sequence, and &amp;quot;Not fasta&amp;quot; if unsuccessfully verified. You can find a description of fasta format in [[Biological knowledge needed in the course]]. You are expected to know which symbols are used for DNA and protein sequence - or that you are able to look it up.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Building on your experience with the previous exercise, make a program that reads a fasta file, discard entries that can not conform to DNA or protein sequence, and rewrite (using your &#039;&#039;&#039;fastawrite()&#039;&#039;&#039; function) the acceptable entries in the output file &#039;&#039;fastaout.fsa&#039;&#039;, in such a way that the normal 60 chars per line is followed with no spaces in between. The program must inform the user how many entries was kept and how many discarded. Hint: Test on &#039;&#039;dnanoise.fsa&#039;&#039;, which &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;contain &lt;/del&gt;3 entries that should be discarded.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Building on your experience with the previous exercise, make a program that reads a fasta file, discard entries that can not conform to DNA or protein sequence, and rewrite (using your &#039;&#039;&#039;fastawrite()&#039;&#039;&#039; function) the acceptable entries in the output file &#039;&#039;fastaout.fsa&#039;&#039;, in such a way that the normal 60 chars per line is followed with no spaces in between. The program must inform the user how many entries was kept and how many discarded. Hint: Test on &#039;&#039;dnanoise.fsa&#039;&#039;, which &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;contains &lt;/ins&gt;3 entries that should be discarded.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# [[File:consensus.png|50px|right]] In the file &amp;#039;&amp;#039;alignment.fsa&amp;#039;&amp;#039; is a protein alignment of part of the insulin gene from different organisms. Read the fasta file and determine the consensus sequence of the alignment, which you print. The consensus sequence is simply the most frequent occurring amino acid on each position.&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# [[File:consensus.png|50px|right]] In the file &amp;#039;&amp;#039;alignment.fsa&amp;#039;&amp;#039; is a protein alignment of part of the insulin gene from different organisms. Read the fasta file and determine the consensus sequence of the alignment, which you print. The consensus sequence is simply the most frequent occurring amino acid on each position.&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# All HIV envelope proteins from various HIV strains in SwissProt have been identified and collected in the file &amp;#039;&amp;#039;HIVenvelope.txt&amp;#039;&amp;#039;. Using regular repressions you must extract the ID and the protein sequence from each entry and save them in a fasta file named &amp;#039;&amp;#039;HIVenv.fsa&amp;#039;&amp;#039;. This job is not new to you and you can use your &amp;#039;&amp;#039;&amp;#039;fastawrite()&amp;#039;&amp;#039;&amp;#039; function if you want to.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# All HIV envelope proteins from various HIV strains in SwissProt have been identified and collected in the file &amp;#039;&amp;#039;HIVenvelope.txt&amp;#039;&amp;#039;. Using regular repressions you must extract the ID and the protein sequence from each entry and save them in a fasta file named &amp;#039;&amp;#039;HIVenv.fsa&amp;#039;&amp;#039;. This job is not new to you and you can use your &amp;#039;&amp;#039;&amp;#039;fastawrite()&amp;#039;&amp;#039;&amp;#039; function if you want to.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=226&amp;oldid=prev</id>
		<title>WikiSysop: /* Subjects covered */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=226&amp;oldid=prev"/>
		<updated>2025-09-06T15:58:32Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Subjects covered&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 17:58, 6 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l15&quot;&gt;Line 15:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 15:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Subjects covered ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Subjects covered ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Regular expressions, duh.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Regular expressions, duh.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Patterns, how to design and use them.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Patterns, how to design and &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;use them - and not &lt;/ins&gt;use them.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=217&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=217&amp;oldid=prev"/>
		<updated>2025-09-05T11:26:03Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 13:26, 5 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l18&quot;&gt;Line 18:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 18:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises to be handed in ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Exercise &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;5 &lt;/del&gt;to &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;7 &lt;/del&gt;has strong taste of something I would do at an exam. It is also an interesting beginning of the making a HIV vaccine. The data is real and the methods are real.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Exercise &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;6 &lt;/ins&gt;to &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;8 &lt;/ins&gt;has strong taste of something I would do at an exam. It is also an interesting beginning of the making a HIV vaccine. The data is real and the methods are real.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that accepts a string as input from the keyboard. Use regular expressions (RE) to determine if the input is a number. The goal is to do this with a SINGLE regex.&amp;lt;br&amp;gt; These should all be considered as numbers: &amp;quot;4&amp;quot;   &amp;quot;-7&amp;quot;   &amp;quot;0.656&amp;quot;   &amp;quot;-67.35555&amp;quot;&amp;lt;br&amp;gt; These are not numbers: &amp;quot;5.&amp;quot;  &amp;quot;56F&amp;quot;  &amp;quot;.32&amp;quot;  &amp;quot;-.04&amp;quot;  &amp;quot;1+1&amp;quot;&amp;lt;br&amp;gt; Note: The program is very simple, but it is likely the most difficult regular expression, you will have to make in this set of exercises. Perhaps you should do the following exercises before attempting this one - just to get some experience first.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that accepts a string as input from the keyboard. Use regular expressions (RE) to determine if the input is a number. The goal is to do this with a SINGLE regex.&amp;lt;br&amp;gt; These should all be considered as numbers: &amp;quot;4&amp;quot;   &amp;quot;-7&amp;quot;   &amp;quot;0.656&amp;quot;   &amp;quot;-67.35555&amp;quot;&amp;lt;br&amp;gt; These are not numbers: &amp;quot;5.&amp;quot;  &amp;quot;56F&amp;quot;  &amp;quot;.32&amp;quot;  &amp;quot;-.04&amp;quot;  &amp;quot;1+1&amp;quot;&amp;lt;br&amp;gt; Note: The program is very simple, but it is likely the most difficult regular expression, you will have to make in this set of exercises. Perhaps you should do the following exercises before attempting this one - just to get some experience first.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Regex is often used for validation. This time make a small program that ask for an email address, and checks if it is in a proper format. Quite similar to the previous exercise, just a different thing to validate.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Regex is often used for validation. This time make a small program that ask for an email address, and checks if it is in a proper format. Quite similar to the previous exercise, just a different thing to validate.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=216&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises to be handed in */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=216&amp;oldid=prev"/>
		<updated>2025-09-05T11:07:50Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises to be handed in&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 13:07, 5 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l20&quot;&gt;Line 20:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 20:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Exercise 5 to 7 has strong taste of something I would do at an exam. It is also an interesting beginning of the making a HIV vaccine. The data is real and the methods are real.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Exercise 5 to 7 has strong taste of something I would do at an exam. It is also an interesting beginning of the making a HIV vaccine. The data is real and the methods are real.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that accepts a string as input from the keyboard. Use regular expressions (RE) to determine if the input is a number. The goal is to do this with a SINGLE regex.&amp;lt;br&amp;gt; These should all be considered as numbers: &amp;quot;4&amp;quot;   &amp;quot;-7&amp;quot;   &amp;quot;0.656&amp;quot;   &amp;quot;-67.35555&amp;quot;&amp;lt;br&amp;gt; These are not numbers: &amp;quot;5.&amp;quot;  &amp;quot;56F&amp;quot;  &amp;quot;.32&amp;quot;  &amp;quot;-.04&amp;quot;  &amp;quot;1+1&amp;quot;&amp;lt;br&amp;gt; Note: The program is very simple, but it is likely the most difficult regular expression, you will have to make in this set of exercises. Perhaps you should do the following exercises before attempting this one - just to get some experience first.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that accepts a string as input from the keyboard. Use regular expressions (RE) to determine if the input is a number. The goal is to do this with a SINGLE regex.&amp;lt;br&amp;gt; These should all be considered as numbers: &amp;quot;4&amp;quot;   &amp;quot;-7&amp;quot;   &amp;quot;0.656&amp;quot;   &amp;quot;-67.35555&amp;quot;&amp;lt;br&amp;gt; These are not numbers: &amp;quot;5.&amp;quot;  &amp;quot;56F&amp;quot;  &amp;quot;.32&amp;quot;  &amp;quot;-.04&amp;quot;  &amp;quot;1+1&amp;quot;&amp;lt;br&amp;gt; Note: The program is very simple, but it is likely the most difficult regular expression, you will have to make in this set of exercises. Perhaps you should do the following exercises before attempting this one - just to get some experience first.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;# Regex is often used for validation. This time make a small program that ask for an email address, and checks if it is in a proper format. Quite similar to the previous exercise, just a different thing to validate.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that can read and verify a fasta file. Use your previous function &amp;#039;&amp;#039;&amp;#039;fastaread()&amp;#039;&amp;#039;&amp;#039; Test the program with &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; and &amp;#039;&amp;#039;dnanoise.fsa&amp;#039;&amp;#039;. Verification here means that the program prints &amp;quot;DNA fasta&amp;quot; or &amp;quot;Protein fasta&amp;quot; if the file is successfully verified for either dna or protein sequence, and &amp;quot;Not fasta&amp;quot; if unsuccessfully verified. You can find a description of fasta format in [[Biological knowledge needed in the course]]. You are expected to know which symbols are used for DNA and protein sequence - or that you are able to look it up.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Make a program that can read and verify a fasta file. Use your previous function &amp;#039;&amp;#039;&amp;#039;fastaread()&amp;#039;&amp;#039;&amp;#039; Test the program with &amp;#039;&amp;#039;dna7.fsa&amp;#039;&amp;#039; and &amp;#039;&amp;#039;dnanoise.fsa&amp;#039;&amp;#039;. Verification here means that the program prints &amp;quot;DNA fasta&amp;quot; or &amp;quot;Protein fasta&amp;quot; if the file is successfully verified for either dna or protein sequence, and &amp;quot;Not fasta&amp;quot; if unsuccessfully verified. You can find a description of fasta format in [[Biological knowledge needed in the course]]. You are expected to know which symbols are used for DNA and protein sequence - or that you are able to look it up.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Building on your experience with the previous exercise, make a program that reads a fasta file, discard entries that can not conform to DNA or protein sequence, and rewrite (using your &amp;#039;&amp;#039;&amp;#039;fastawrite()&amp;#039;&amp;#039;&amp;#039; function) the acceptable entries in the output file &amp;#039;&amp;#039;fastaout.fsa&amp;#039;&amp;#039;, in such a way that the normal 60 chars per line is followed with no spaces in between. The program must inform the user how many entries was kept and how many discarded. Hint: Test on &amp;#039;&amp;#039;dnanoise.fsa&amp;#039;&amp;#039;, which contain 3 entries that should be discarded.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;# Building on your experience with the previous exercise, make a program that reads a fasta file, discard entries that can not conform to DNA or protein sequence, and rewrite (using your &amp;#039;&amp;#039;&amp;#039;fastawrite()&amp;#039;&amp;#039;&amp;#039; function) the acceptable entries in the output file &amp;#039;&amp;#039;fastaout.fsa&amp;#039;&amp;#039;, in such a way that the normal 60 chars per line is followed with no spaces in between. The program must inform the user how many entries was kept and how many discarded. Hint: Test on &amp;#039;&amp;#039;dnanoise.fsa&amp;#039;&amp;#039;, which contain 3 entries that should be discarded.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=215&amp;oldid=prev</id>
		<title>WikiSysop: /* Exercises for extra practice */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22116/index.php?title=Regular_expressions&amp;diff=215&amp;oldid=prev"/>
		<updated>2025-09-05T10:43:48Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Exercises for extra practice&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 12:43, 5 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l28&quot;&gt;Line 28:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 28:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Exercises for extra practice ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;protein alignment motif&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
</feed>