<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://teaching.healthtech.dtu.dk:443/22112/index.php?action=history&amp;feed=atom&amp;title=MapReduce_and_Binary_representation</id>
	<title>MapReduce and Binary representation - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://teaching.healthtech.dtu.dk:443/22112/index.php?action=history&amp;feed=atom&amp;title=MapReduce_and_Binary_representation"/>
	<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22112/index.php?title=MapReduce_and_Binary_representation&amp;action=history"/>
	<updated>2026-05-02T07:06:05Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22112/index.php?title=MapReduce_and_Binary_representation&amp;diff=25&amp;oldid=prev</id>
		<title>WikiSysop at 10:57, 6 March 2024</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22112/index.php?title=MapReduce_and_Binary_representation&amp;diff=25&amp;oldid=prev"/>
		<updated>2024-03-06T10:57:08Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 12:57, 6 March 2024&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l6&quot;&gt;Line 6:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 6:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=6d4f429b-c110-4439-9bdb-af170077108e MapReduce Framework, short intro]&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=6d4f429b-c110-4439-9bdb-af170077108e MapReduce Framework, short intro]&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=a0160728-b8be-4b9b-820e-af170076f2c2 Binary numbers, representation and operations]&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=a0160728-b8be-4b9b-820e-af170076f2c2 Binary numbers, representation and operations]&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Powerpoint: [https://teaching.healthtech.dtu.dk/material/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;36612&lt;/del&gt;/HPCLife08-MapReduce.ppt MapReduce]&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Powerpoint: [https://teaching.healthtech.dtu.dk/material/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;22112&lt;/ins&gt;/HPCLife08-MapReduce.ppt MapReduce]&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Youtube: [https://www.youtube.com/watch?v=2SUvWfNJSsM Binary numbers - watch from 2:00 to 6:00]&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Youtube: [https://www.youtube.com/watch?v=2SUvWfNJSsM Binary numbers - watch from 2:00 to 6:00]&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=2105b332-20b7-4718-b889-af170076be9a Exercises]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=2105b332-20b7-4718-b889-af170076be9a Exercises]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22112/index.php?title=MapReduce_and_Binary_representation&amp;diff=24&amp;oldid=prev</id>
		<title>WikiSysop: Created page with &quot;{| width=500  style=&quot;float:right; margin-left: 10px; margin-top: -56px;&quot; |Previous: More parallelism |Next: Hash usage |} == Material for the lesson == Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=6d4f429b-c110-4439-9bdb-af170077108e MapReduce Framework, short intro]&lt;br&gt; Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=a0160728-b8be-4b9b-820e-af170076f2c2 Binary numbers, representation and operations]&lt;br&gt; Powerpoint: [https://teaching.he...&quot;</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22112/index.php?title=MapReduce_and_Binary_representation&amp;diff=24&amp;oldid=prev"/>
		<updated>2024-03-06T10:56:40Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;{| width=500  style=&amp;quot;float:right; margin-left: 10px; margin-top: -56px;&amp;quot; |Previous: &lt;a href=&quot;/22112/index.php/More_parallelism&quot; title=&quot;More parallelism&quot;&gt;More parallelism&lt;/a&gt; |Next: &lt;a href=&quot;/22112/index.php/Hash_usage&quot; title=&quot;Hash usage&quot;&gt;Hash usage&lt;/a&gt; |} == Material for the lesson == Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=6d4f429b-c110-4439-9bdb-af170077108e MapReduce Framework, short intro]&amp;lt;br&amp;gt; Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=a0160728-b8be-4b9b-820e-af170076f2c2 Binary numbers, representation and operations]&amp;lt;br&amp;gt; Powerpoint: [https://teaching.he...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;{| width=500  style=&amp;quot;float:right; margin-left: 10px; margin-top: -56px;&amp;quot;&lt;br /&gt;
|Previous: [[More parallelism]]&lt;br /&gt;
|Next: [[Hash usage]]&lt;br /&gt;
|}&lt;br /&gt;
== Material for the lesson ==&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=6d4f429b-c110-4439-9bdb-af170077108e MapReduce Framework, short intro]&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=a0160728-b8be-4b9b-820e-af170076f2c2 Binary numbers, representation and operations]&amp;lt;br&amp;gt;&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/36612/HPCLife08-MapReduce.ppt MapReduce]&amp;lt;br&amp;gt;&lt;br /&gt;
Youtube: [https://www.youtube.com/watch?v=2SUvWfNJSsM Binary numbers - watch from 2:00 to 6:00]&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=2105b332-20b7-4718-b889-af170076be9a Exercises]&lt;br /&gt;
&lt;br /&gt;
== Exercises ==&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Questions to answer:&amp;#039;&amp;#039;&amp;#039; How many 10-mers in the human genome (file human.fsa) occur only once? How many different 10-mers are in the human genome? How many different 10-mers are possible?&amp;lt;br&amp;gt;&lt;br /&gt;
Any 10-mer containing anything else but A T C or G are to be disregarded in any form of counting in this exercise.&amp;lt;br&amp;gt;&lt;br /&gt;
You do &amp;#039;&amp;#039;&amp;#039;not&amp;#039;&amp;#039;&amp;#039; need to look at the complement strand for this exercise, although in Real Life it would be mandatory.&amp;lt;br&amp;gt;&lt;br /&gt;
Use the file humantest.fsa for your testing. The hard question to answer is the first once, the others follow from that.&lt;br /&gt;
&lt;br /&gt;
You will discover that some of the difficulty in the exercise is that the sequence contains other chars than ATCG. If you want you can start out &amp;quot;soft&amp;quot; by using the&lt;br /&gt;
&amp;#039;&amp;#039;humanchr1.fsa&amp;#039;&amp;#039; file which only contains one entry with a sequence containing only ATCG.&lt;br /&gt;
&lt;br /&gt;
You must solve this problem in two ways, making two programs.&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;1)&amp;#039;&amp;#039;&amp;#039; Use a dict.&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;2)&amp;#039;&amp;#039;&amp;#039; Use a bytevector as described in the powerpoint.&amp;lt;br&amp;gt;&lt;br /&gt;
First you must realize that a kmer containing only ATCG can be converted to a binary number - this also why we only look at kmers with only ATCG.&lt;br /&gt;
I have made a function below to demonstrate how you can convert from a sequence to a number. Warning: It is for demonstration purpose only. If you use it in your program it will be horribly slow.&amp;lt;br&amp;gt;&lt;br /&gt;
It should also be clear that every kmer only corresponds to one single number - there is a 1-1 connection between kmer and number, and you can go back and forth between these representations.&amp;lt;br&amp;gt;&lt;br /&gt;
Now if you have a list (bytevector) then every position in that list will correspond to a specific kmer, which again means that you can convert a kmer to a position in a list and increment the number at that position for counting purposes.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# Computes a number from a DNA sequence&lt;br /&gt;
def dna2num(seq):&lt;br /&gt;
    if len(seq) &amp;gt; 31:&lt;br /&gt;
        print(&amp;quot;Sequence is to large to be contained in a number&amp;quot;)&lt;br /&gt;
        sys.exit(1)&lt;br /&gt;
    num = 0&lt;br /&gt;
    for char in seq:&lt;br /&gt;
        # Bitshift two bits to the left&lt;br /&gt;
        num &amp;lt;&amp;lt;= 2&lt;br /&gt;
        if char == &amp;#039;A&amp;#039;:&lt;br /&gt;
            pass&lt;br /&gt;
        elif char == &amp;#039;T&amp;#039;:&lt;br /&gt;
            num |= 0b11&lt;br /&gt;
        elif char == &amp;#039;C&amp;#039;:&lt;br /&gt;
            num |= 0b01&lt;br /&gt;
        elif char == &amp;#039;G&amp;#039;:&lt;br /&gt;
            num |= 0b10&lt;br /&gt;
        else:&lt;br /&gt;
            print(&amp;quot;Illegal base in DNA sequence:&amp;quot;, char)&lt;br /&gt;
            sys.exit(1)&lt;br /&gt;
    return num&lt;br /&gt;
&lt;br /&gt;
print(dna2num(&amp;#039;GTACGTACGTACG&amp;#039;))&lt;br /&gt;
&lt;br /&gt;
# Creating a bytevector of a certain size - filled with 0&lt;br /&gt;
vector = bytearray(1000)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In both cases you must measure how much memory your main data structure uses and long it takes for the program to finish. Consider this a performance contest. The fastest one of you wins. :-)&amp;lt;br&amp;gt;&lt;br /&gt;
It could be a good idea to measure the time each step takes.&amp;lt;br&amp;gt;&lt;br /&gt;
Also develop your program so it will work with any size of k-mer. Simply have a variable called &amp;#039;&amp;#039;mersize&amp;#039;&amp;#039; that you can set to any positive integer value.&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Questions:&amp;#039;&amp;#039;&amp;#039; How does changing that to 16 affect your programs? Can you use any positive value as &amp;#039;&amp;#039;mersize&amp;#039;&amp;#039;?&lt;br /&gt;
&lt;br /&gt;
My fastest time using a dict is 2670 seconds and a bytevector is 1228 seconds using purely sequential implementations. The number of 10-mers only occurring once is below 10.&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
</feed>