<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://teaching.healthtech.dtu.dk/22113/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=WikiSysop</id>
	<title>22113 - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://teaching.healthtech.dtu.dk/22113/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=WikiSysop"/>
	<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php/Special:Contributions/WikiSysop"/>
	<updated>2026-05-01T06:46:22Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=QT_clustering&amp;diff=127</id>
		<title>QT clustering</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=QT_clustering&amp;diff=127"/>
		<updated>2025-05-15T12:34:15Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
The program reads a number of data points (multi-dimensional vectors) from a file and partitions those into clusters. Clustering is important in discovering patterns or modes in multi-dimensional data sets. It is also a method of organizing data examples into similar groups (clusters). In this particular case, QT clustering partitions the data set such that each example (data point) is assigned to exactly one cluster. QT clustering is superior to K-means clustering in that the number of clusters is not given beforehand and it yields the same result in repeated runs. It requires more CPU time, though.&lt;br /&gt;
&lt;br /&gt;
QT (Quality Threshold) has its name from the user-determined threshold (distance) of the maximal diameter of the clusters that the method computes.&lt;br /&gt;
&lt;br /&gt;
While not strictly required, you could make your QT algorithm into a class. That would make it very easy to include and use in the future.&lt;br /&gt;
&lt;br /&gt;
===Input and output===&lt;br /&gt;
The input is a tab separated file containing one data point on each line. Each data point is a vector consisting of a number of numbers. The program should handle any given vector size, but the vector size is constant in any data file. Input file example:&lt;br /&gt;
&lt;br /&gt;
 ex01	8.76	3.29	1.05&lt;br /&gt;
 ex02	12.3	2.33	3.53&lt;br /&gt;
 ex03	-0.54	-3.56	1.45&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
&lt;br /&gt;
The output is all data points, partitioned in the clusters they belong to. Output example where each cluster starts with the cluster and is proceeded by the the members of that cluster:&lt;br /&gt;
&lt;br /&gt;
 Cluster-1&lt;br /&gt;
 ex10	1.04	2.98	1.34&lt;br /&gt;
 ex12	1.23	2.34	1.69&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
 Cluster-2&lt;br /&gt;
 ex04	-0.34	3.51	9.02&lt;br /&gt;
 ex07	-8.56	5.12	12.5&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
&lt;br /&gt;
===Examples of program execution===&lt;br /&gt;
 cluster.py vectors.txt 500&lt;br /&gt;
The 500 is the maximum diameter for a cluster in the data set. An interesting twist could be to automatically decide the cluster diameter like this: X % of the distance between the two data point furthest away from each other. Called like this&lt;br /&gt;
 cluster.py vectors.txt 30%&lt;br /&gt;
&lt;br /&gt;
===Details===&lt;br /&gt;
The method works for any type of data set where it is possible to calculate a &#039;&#039;distance&#039;&#039; between any two points. In this project we are just considering euclidean distances, as they are simple.&lt;br /&gt;
[https://en.wikipedia.org/wiki/Pythagorean_theorem Pythagoras&#039;s theorem].&lt;br /&gt;
&lt;br /&gt;
The algorithm works like this.&lt;br /&gt;
# For each point in the data set, calculate the &#039;&#039;candidate cluster&#039;&#039; with that point as the starting point. With &#039;&#039;&#039;n&#039;&#039;&#039; points in the data set, there are &#039;&#039;&#039;n&#039;&#039;&#039; candidate clusters, obviously.&lt;br /&gt;
# Choose the candidate cluster that contains most points as the primary cluster and remove those points from the data set. If two or more candidate clusters have equally most points, pick the cluster with the smallest diameter. If they are still equal, pick the first you found.&lt;br /&gt;
# Repeat step 1 and 2 until there are no points in the data set or a set limit has been reached; like all remaining candidate clusters has less than, say, 10 points and are therefore not true clusters, but noise.&lt;br /&gt;
# Print the resulting clusters.&lt;br /&gt;
&lt;br /&gt;
A &#039;&#039;candidate cluster&#039;&#039; for a point is calculated using &amp;quot;complete linkage&amp;quot; like this:&lt;br /&gt;
# Consider the starting point as the beginning of the candidate cluster for that point. This is trivially seen as a subset of your data set.&lt;br /&gt;
# Add one point from your data set at a time in such a way that you extend the candidate cluster diameter the least. Again, if two points would extend the diameter the least, pick the first one you find.&lt;br /&gt;
# Continue adding points - that is repeat step 2 - to your candidate cluster until the diameter exceeds the Quality Threshold (hence the name QT clustering). The point that makes the diameter exceed the QT is not part of the candidate cluster.&lt;br /&gt;
Important definition: The diameter of a data set (or candidate cluster) is the distance of the two points furthest from each other.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: All points in the data set can participate in multiple candidate clusters. Any point is not permanently assigned to a candidate cluster, before you actually pick the largest one and remove the &amp;quot;winners&amp;quot; points from the data set.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: Building a candidate cluster according to above method is &#039;&#039;&#039;NOT&#039;&#039;&#039; the same method as just adding the nearest point to the starting point - or any point in the growing candidate cluster.&lt;br /&gt;
&lt;br /&gt;
A fairly large part of this project is optimizing the algorithm just described. This is done by gaining insight in the algorithm - not calculating what does not need to be calculated, not calculating the same thing again and again.&lt;br /&gt;
&lt;br /&gt;
Various data sets: [https://teaching.healthtech.dtu.dk/material/22113/point100.lst 100 data points], [https://teaching.healthtech.dtu.dk/material/22113/point1000.lst 1000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point3000.lst 3000 data points],&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/point4169.lst 4169 data points], [https://teaching.healthtech.dtu.dk/material/22113/point5000.lst 5000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point6000.lst 6000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point10000.lst 10000 data points].&lt;br /&gt;
&lt;br /&gt;
Checking the correctness of your program.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/QT-small.lst Result] of clustering the small list (100 points) with QT being 30% of the diameter.&amp;lt;br&amp;gt; &lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/result1000.lst Result] of clustering 1000 points with QT being 30% of the diameter.&lt;br /&gt;
&lt;br /&gt;
The algorithm is deterministic - meaning that an implementation will yield the same result on the same data set every time. However, in the description there are two places, where &amp;quot;you pick the first one&amp;quot; you find. This is implementation dependent and therefore two different implementations of QT can give rise to different results. The data sets given here are constructed in such a way, that this will NOT happen for them, i.e. no matter how you implement your method you should get the same result.&lt;br /&gt;
&lt;br /&gt;
===References===&lt;br /&gt;
# [http://genome.cshlp.org/content/9/11/1106.full.pdf+html Exploring expression data: identification and analysis of coexpressed genes. LJ Heyer, S Kruglyak, S Yooseph - Genome research, 1999 - genome.cshlp.org]&lt;br /&gt;
# [https://www.chem.agilent.com/cag/bsp/products/gsgx/Downloads/pdf/qt_clustering.pdf QT clustering in industry - Agilent]&lt;br /&gt;
Peter&#039;s speed reference using 30% as the threshold, using pure python:&lt;br /&gt;
 Points          Time (seconds)        Improved&lt;br /&gt;
    100             0                         0&lt;br /&gt;
   1000             2                         1&lt;br /&gt;
   3000            32                        16&lt;br /&gt;
   4169            65                        33&lt;br /&gt;
   5000           115                        53&lt;br /&gt;
   6000           149                        74&lt;br /&gt;
  10000           505                       223&lt;br /&gt;
It is not required to achieve these numbers, but it is important to have a reference - and maybe a goal. The improved times have been reached by using efficient data types, not by any change in the method or computer.&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=QT_clustering&amp;diff=126</id>
		<title>QT clustering</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=QT_clustering&amp;diff=126"/>
		<updated>2025-05-15T12:33:37Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
The program reads a number of data points (multi-dimensional vectors) from a file and partitions those into clusters. Clustering is important in discovering patterns or modes in multi-dimensional data sets. It is also a method of organizing data examples into similar groups (clusters). In this particular case, QT clustering partitions the data set such that each example (data point) is assigned to exactly one cluster. QT clustering is superior to K-means clustering in that the number of clusters is not given beforehand and it yields the same result in repeated runs. It requires more CPU time, though.&lt;br /&gt;
&lt;br /&gt;
QT (Quality Threshold) has its name from the user-determined threshold (distance) of the maximal diameter of the clusters that the method computes.&lt;br /&gt;
&lt;br /&gt;
While not strictly required, you could make your QT algorithm into a class. That would make it very easy to include and use in the future.&lt;br /&gt;
&lt;br /&gt;
===Input and output===&lt;br /&gt;
The input is a tab separated file containing one data point on each line. Each data point is a vector consisting of a number of numbers. The program should handle any given vector size, but the vector size is constant in any data file. Input file example:&lt;br /&gt;
&lt;br /&gt;
 ex01	8.76	3.29	1.05&lt;br /&gt;
 ex02	12.3	2.33	3.53&lt;br /&gt;
 ex03	-0.54	-3.56	1.45&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
&lt;br /&gt;
The output is all data points, partitioned in the clusters they belong to. Output example where each cluster starts with the cluster and is proceeded by the the members of that cluster:&lt;br /&gt;
&lt;br /&gt;
 Cluster-1&lt;br /&gt;
 ex10	1.04	2.98	1.34&lt;br /&gt;
 ex12	1.23	2.34	1.69&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
 Cluster-2&lt;br /&gt;
 ex04	-0.34	3.51	9.02&lt;br /&gt;
 ex07	-8.56	5.12	12.5&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
&lt;br /&gt;
===Examples of program execution===&lt;br /&gt;
 cluster.py vectors.txt 500&lt;br /&gt;
The 500 is the maximum diameter for a cluster in the data set. An interesting twist could be to automatically decide the cluster diameter like this: X % of the distance between the two data point furthest away from each other. Called like this&lt;br /&gt;
 cluster.py vectors.txt 30%&lt;br /&gt;
&lt;br /&gt;
===Details===&lt;br /&gt;
The method works for any type of data set where it is possible to calculate a &#039;&#039;distance&#039;&#039; between any two points. In this project we are just considering euclidean distances, as they are simple.&lt;br /&gt;
[https://en.wikipedia.org/wiki/Pythagorean_theorem Pythagoras&#039;s theorem].&lt;br /&gt;
&lt;br /&gt;
The algorithm works like this.&lt;br /&gt;
# For each point in the data set, calculate the &#039;&#039;candidate cluster&#039;&#039; with that point as the starting point. With &#039;&#039;&#039;n&#039;&#039;&#039; points in the data set, there are &#039;&#039;&#039;n&#039;&#039;&#039; candidate clusters, obviously.&lt;br /&gt;
# Choose the candidate cluster that contains most points as the primary cluster and remove those points from the data set. If two or more candidate clusters have equally most points, pick the cluster with the smallest diameter. If they are still equal, pick the first you found.&lt;br /&gt;
# Repeat step 1 and 2 until there are no points in the data set or a set limit has been reached; like all remaining candidate clusters has less than, say, 10 points and are therefore not true clusters, but noise.&lt;br /&gt;
# Print the resulting clusters.&lt;br /&gt;
&lt;br /&gt;
A &#039;&#039;candidate cluster&#039;&#039; for a point is calculated using &amp;quot;complete linkage&amp;quot; like this:&lt;br /&gt;
# Consider the starting point as the beginning of the candidate cluster for that point. This is trivially seen as a subset of your data set.&lt;br /&gt;
# Add one point from your data set at a time in such a way that you extend the candidate cluster diameter the least. Again, if two points would extend the diameter the least, pick the first one you find.&lt;br /&gt;
# Continue adding points - that is repeat step 2 - to your candidate cluster until the diameter exceeds the Quality Threshold (hence the name QT clustering). The point that makes the diameter exceed the QT is not part of the candidate cluster.&lt;br /&gt;
Important definition: The diameter of a data set (or candidate cluster) is the distance of the two points furthest from each other.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: All points in the data set can participate in multiple candidate clusters. Any point is not permanently assigned to a candidate cluster, before you actually pick the largest one and remove the &amp;quot;winners&amp;quot; points from the data set.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: Building a candidate cluster according to above method is &#039;&#039;&#039;NOT&#039;&#039;&#039; the same method as just adding the nearest point to the starting point - or any point in the growing candidate cluster.&lt;br /&gt;
&lt;br /&gt;
A fairly large part of this project is optimizing the algorithm just described. This is done by gaining insight in the algorithm - not calculating what does not need to be calculated, not calculating the same thing again and again.&lt;br /&gt;
&lt;br /&gt;
Various data sets: [https://teaching.healthtech.dtu.dk/material/22113/point100.lst 100 data points], [https://teaching.healthtech.dtu.dk/material/22113/point1000.lst 1000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point3000.lst 3000 data points],&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/point4169.lst 4169 data points], [https://teaching.healthtech.dtu.dk/material/22113/point5000.lst 5000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point6000.lst 6000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point10000.lst 10000 data points].&lt;br /&gt;
&lt;br /&gt;
Checking the correctness of your program.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/QT-small.lst Result] of clustering the small list (100 points) with QT being 30% of the diameter.&amp;lt;br&amp;gt; &lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/result1000.lst Result] of clustering 1000 points with QT being 30% of the diameter.&lt;br /&gt;
&lt;br /&gt;
The algorithm is deterministic - meaning that an implementation will yield the same result on the same data set every time. However, in the description there are two places, where &amp;quot;you pick the first one&amp;quot; you find. This is implementation dependent and therefore two different implementations of QT can give rise to different results. The data sets given here are constructed in such a way, that this will NOT happen for them, i.e. no matter how you implement your method you should get the same result.&lt;br /&gt;
&lt;br /&gt;
===References===&lt;br /&gt;
# [http://genome.cshlp.org/content/9/11/1106.full.pdf+html Exploring expression data: identification and analysis of coexpressed genes. LJ Heyer, S Kruglyak, S Yooseph - Genome research, 1999 - genome.cshlp.org]&lt;br /&gt;
# [https://www.chem.agilent.com/cag/bsp/products/gsgx/Downloads/pdf/qt_clustering.pdf QT clustering in industry - Agilent]&lt;br /&gt;
Peter&#039;s speed reference using 30% as the threshold, using pure python:&lt;br /&gt;
 Points          Time (seconds)        Improved&lt;br /&gt;
    100             0                         0&lt;br /&gt;
   1000             2                         1&lt;br /&gt;
   3000            32                        16&lt;br /&gt;
   4169            65                        33&lt;br /&gt;
   5000           115                        53&lt;br /&gt;
   6000           149                        74&lt;br /&gt;
  10000           505                       223&lt;br /&gt;
It is not required to achieve these numbers, but it is important to have a reference - and maybe a goal. The improved times have been reached by using efficient data types, not by any change in the method.&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=QT_clustering&amp;diff=125</id>
		<title>QT clustering</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=QT_clustering&amp;diff=125"/>
		<updated>2025-04-22T13:12:58Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
The program reads a number of data points (multi-dimensional vectors) from a file and partitions those into clusters. Clustering is important in discovering patterns or modes in multi-dimensional data sets. It is also a method of organizing data examples into similar groups (clusters). In this particular case, QT clustering partitions the data set such that each example (data point) is assigned to exactly one cluster. QT clustering is superior to K-means clustering in that the number of clusters is not given beforehand and it yields the same result in repeated runs. It requires more CPU time, though.&lt;br /&gt;
&lt;br /&gt;
QT (Quality Threshold) has its name from the user-determined threshold (distance) of the maximal diameter of the clusters that the method computes.&lt;br /&gt;
&lt;br /&gt;
While not strictly required, you could make your QT algorithm into a class. That would make it very easy to include and use in the future.&lt;br /&gt;
&lt;br /&gt;
===Input and output===&lt;br /&gt;
The input is a tab separated file containing one data point on each line. Each data point is a vector consisting of a number of numbers. The program should handle any given vector size, but the vector size is constant in any data file. Input file example:&lt;br /&gt;
&lt;br /&gt;
 ex01	8.76	3.29	1.05&lt;br /&gt;
 ex02	12.3	2.33	3.53&lt;br /&gt;
 ex03	-0.54	-3.56	1.45&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
&lt;br /&gt;
The output is all data points, partitioned in the clusters they belong to. Output example where each cluster starts with the cluster and is proceeded by the the members of that cluster:&lt;br /&gt;
&lt;br /&gt;
 Cluster-1&lt;br /&gt;
 ex10	1.04	2.98	1.34&lt;br /&gt;
 ex12	1.23	2.34	1.69&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
 Cluster-2&lt;br /&gt;
 ex04	-0.34	3.51	9.02&lt;br /&gt;
 ex07	-8.56	5.12	12.5&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
&lt;br /&gt;
===Examples of program execution===&lt;br /&gt;
 cluster.py vectors.txt 500&lt;br /&gt;
The 500 is the maximum diameter for a cluster in the data set. An interesting twist could be to automatically decide the cluster diameter like this: X % of the distance between the two data point furthest away from each other. Called like this&lt;br /&gt;
 cluster.py vectors.txt 30%&lt;br /&gt;
&lt;br /&gt;
===Details===&lt;br /&gt;
The method works for any type of data set where it is possible to calculate a &#039;&#039;distance&#039;&#039; between any two points. In this project we are just considering euclidean distances, as they are simple.&lt;br /&gt;
[https://en.wikipedia.org/wiki/Pythagorean_theorem Pythagoras&#039;s theorem].&lt;br /&gt;
&lt;br /&gt;
The algorithm works like this.&lt;br /&gt;
# For each point in the data set, calculate the &#039;&#039;candidate cluster&#039;&#039; with that point as the starting point. With &#039;&#039;&#039;n&#039;&#039;&#039; points in the data set, there are &#039;&#039;&#039;n&#039;&#039;&#039; candidate clusters, obviously.&lt;br /&gt;
# Choose the candidate cluster that contains most points as the primary cluster and remove those points from the data set. If two or more candidate clusters have equally most points, pick the cluster with the smallest diameter. If they are still equal, pick the first you found.&lt;br /&gt;
# Repeat step 1 and 2 until there are no points in the data set or a set limit has been reached; like all remaining candidate clusters has less than, say, 10 points and are therefore not true clusters, but noise.&lt;br /&gt;
# Print the resulting clusters.&lt;br /&gt;
&lt;br /&gt;
A &#039;&#039;candidate cluster&#039;&#039; for a point is calculated using &amp;quot;complete linkage&amp;quot; like this:&lt;br /&gt;
# Consider the starting point as the beginning of the candidate cluster for that point. This is trivially seen as a subset of your data set.&lt;br /&gt;
# Add one point from your data set at a time in such a way that you extend the candidate cluster diameter the least. Again, if two points would extend the diameter the least, pick the first one you find.&lt;br /&gt;
# Continue adding points - that is repeat step 2 - to your candidate cluster until the diameter exceeds the Quality Threshold (hence the name QT clustering). The point that makes the diameter exceed the QT is not part of the candidate cluster.&lt;br /&gt;
Important definition: The diameter of a data set (or candidate cluster) is the distance of the two points furthest from each other.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: All points in the data set can participate in multiple candidate clusters. Any point is not permanently assigned to a candidate cluster, before you actually pick the largest one and remove the &amp;quot;winners&amp;quot; points from the data set.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: Building a candidate cluster according to above method is &#039;&#039;&#039;NOT&#039;&#039;&#039; the same method as just adding the nearest point to the starting point - or any point in the growing candidate cluster.&lt;br /&gt;
&lt;br /&gt;
A fairly large part of this project is optimizing the algorithm just described. This is done by gaining insight in the algorithm - not calculating what does not need to be calculated, not calculating the same thing again and again.&lt;br /&gt;
&lt;br /&gt;
Various data sets: [https://teaching.healthtech.dtu.dk/material/22113/point100.lst 100 data points], [https://teaching.healthtech.dtu.dk/material/22113/point1000.lst 1000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point3000.lst 3000 data points],&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/point4169.lst 4169 data points], [https://teaching.healthtech.dtu.dk/material/22113/point5000.lst 5000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point6000.lst 6000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point10000.lst 10000 data points].&lt;br /&gt;
&lt;br /&gt;
Checking the correctness of your program.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/QT-small.lst Result] of clustering the small list (100 points) with QT being 30% of the diameter.&amp;lt;br&amp;gt; &lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/result1000.lst Result] of clustering 1000 points with QT being 30% of the diameter.&lt;br /&gt;
&lt;br /&gt;
The algorithm is deterministic - meaning that an implementation will yield the same result on the same data set every time. However, in the description there are two places, where &amp;quot;you pick the first one&amp;quot; you find. This is implementation dependent and therefore two different implementations of QT can give rise to different results. The data sets given here are constructed in such a way, that this will NOT happen for them, i.e. no matter how you implement your method you should get the same result.&lt;br /&gt;
&lt;br /&gt;
===References===&lt;br /&gt;
# [http://genome.cshlp.org/content/9/11/1106.full.pdf+html Exploring expression data: identification and analysis of coexpressed genes. LJ Heyer, S Kruglyak, S Yooseph - Genome research, 1999 - genome.cshlp.org]&lt;br /&gt;
# [https://www.chem.agilent.com/cag/bsp/products/gsgx/Downloads/pdf/qt_clustering.pdf QT clustering in industry - Agilent]&lt;br /&gt;
Peter&#039;s speed reference using 30% as the threshold:&lt;br /&gt;
 Points          Time (seconds)        Improved&lt;br /&gt;
    100             0                         0&lt;br /&gt;
   1000             2                         1&lt;br /&gt;
   3000            32                        20&lt;br /&gt;
   4169            65                        44&lt;br /&gt;
   5000           115                        70&lt;br /&gt;
   6000           149                       106&lt;br /&gt;
  10000           505                       342&lt;br /&gt;
It is not required to achieve these numbers, but it is important to have a reference - and maybe a goal.&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=QT_clustering&amp;diff=124</id>
		<title>QT clustering</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=QT_clustering&amp;diff=124"/>
		<updated>2025-04-22T12:23:38Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
The program reads a number of data points (multi-dimensional vectors) from a file and partitions those into clusters. Clustering is important in discovering patterns or modes in multi-dimensional data sets. It is also a method of organizing data examples into similar groups (clusters). In this particular case, QT clustering partitions the data set such that each example (data point) is assigned to exactly one cluster. QT clustering is superior to K-means clustering in that the number of clusters is not given beforehand and it yields the same result in repeated runs. It requires more CPU time, though.&lt;br /&gt;
&lt;br /&gt;
QT (Quality Threshold) has its name from the user-determined threshold (distance) of the maximal diameter of the clusters that the method computes.&lt;br /&gt;
&lt;br /&gt;
While not strictly required, you could make your QT algorithm into a class. That would make it very easy to include and use in the future.&lt;br /&gt;
&lt;br /&gt;
===Input and output===&lt;br /&gt;
The input is a tab separated file containing one data point on each line. Each data point is a vector consisting of a number of numbers. The program should handle any given vector size, but the vector size is constant in any data file. Input file example:&lt;br /&gt;
&lt;br /&gt;
 ex01	8.76	3.29	1.05&lt;br /&gt;
 ex02	12.3	2.33	3.53&lt;br /&gt;
 ex03	-0.54	-3.56	1.45&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
&lt;br /&gt;
The output is all data points, partitioned in the clusters they belong to. Output example where each cluster starts with the cluster and is proceeded by the the members of that cluster:&lt;br /&gt;
&lt;br /&gt;
 Cluster-1&lt;br /&gt;
 ex10	1.04	2.98	1.34&lt;br /&gt;
 ex12	1.23	2.34	1.69&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
 Cluster-2&lt;br /&gt;
 ex04	-0.34	3.51	9.02&lt;br /&gt;
 ex07	-8.56	5.12	12.5&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
&lt;br /&gt;
===Examples of program execution===&lt;br /&gt;
 cluster.py vectors.txt 500&lt;br /&gt;
The 500 is the maximum diameter for a cluster in the data set. An interesting twist could be to automatically decide the cluster diameter like this: X % of the distance between the two data point furthest away from each other. Called like this&lt;br /&gt;
 cluster.py vectors.txt 30%&lt;br /&gt;
&lt;br /&gt;
===Details===&lt;br /&gt;
The method works for any type of data set where it is possible to calculate a &#039;&#039;distance&#039;&#039; between any two points. In this project we are just considering euclidean distances, as they are simple.&lt;br /&gt;
[https://en.wikipedia.org/wiki/Pythagorean_theorem Pythagoras&#039;s theorem].&lt;br /&gt;
&lt;br /&gt;
The algorithm works like this.&lt;br /&gt;
# For each point in the data set, calculate the &#039;&#039;candidate cluster&#039;&#039; with that point as the starting point. With &#039;&#039;&#039;n&#039;&#039;&#039; points in the data set, there are &#039;&#039;&#039;n&#039;&#039;&#039; candidate clusters, obviously.&lt;br /&gt;
# Choose the candidate cluster that contains most points as the primary cluster and remove those points from the data set. If two or more candidate clusters have equally most points, pick the cluster with the smallest diameter. If they are still equal, pick the first you found.&lt;br /&gt;
# Repeat step 1 and 2 until there are no points in the data set or a set limit has been reached; like all remaining candidate clusters has less than, say, 10 points and are therefore not true clusters, but noise.&lt;br /&gt;
# Print the resulting clusters.&lt;br /&gt;
&lt;br /&gt;
A &#039;&#039;candidate cluster&#039;&#039; for a point is calculated using &amp;quot;complete linkage&amp;quot; like this:&lt;br /&gt;
# Consider the starting point as the beginning of the candidate cluster for that point. This is trivially seen as a subset of your data set.&lt;br /&gt;
# Add one point from your data set at a time in such a way that you extend the candidate cluster diameter the least. Again, if two points would extend the diameter the least, pick the first one you find.&lt;br /&gt;
# Continue adding points - that is repeat step 2 - to your candidate cluster until the diameter exceeds the Quality Threshold (hence the name QT clustering). The point that makes the diameter exceed the QT is not part of the candidate cluster.&lt;br /&gt;
Important definition: The diameter of a data set (or candidate cluster) is the distance of the two points furthest from each other.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: All points in the data set can participate in multiple candidate clusters. Any point is not permanently assigned to a candidate cluster, before you actually pick the largest one and remove the &amp;quot;winners&amp;quot; points from the data set.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: Building a candidate cluster according to above method is &#039;&#039;&#039;NOT&#039;&#039;&#039; the same method as just adding the nearest point to the starting point - or any point in the growing candidate cluster.&lt;br /&gt;
&lt;br /&gt;
A fairly large part of this project is optimizing the algorithm just described. This is done by gaining insight in the algorithm - not calculating what does not need to be calculated, not calculating the same thing again and again.&lt;br /&gt;
&lt;br /&gt;
Various data sets: [https://teaching.healthtech.dtu.dk/material/22113/point100.lst 100 data points], [https://teaching.healthtech.dtu.dk/material/22113/point1000.lst 1000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point3000.lst 3000 data points],&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/point4169.lst 4169 data points], [https://teaching.healthtech.dtu.dk/material/22113/point5000.lst 5000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point6000.lst 6000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point10000.lst 10000 data points].&lt;br /&gt;
&lt;br /&gt;
Checking the correctness of your program.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/QT-small.lst Result] of clustering the small list (100 points) with QT being 30% of the diameter.&amp;lt;br&amp;gt; &lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/result1000.lst Result] of clustering 1000 points with QT being 30% of the diameter.&lt;br /&gt;
&lt;br /&gt;
The algorithm is deterministic - meaning that an implementation will yield the same result on the same data set every time. However, in the description there are two places, where &amp;quot;you pick the first one&amp;quot; you find. This is implementation dependent and therefore two different implementations of QT can give rise to different results. The data sets given here are constructed in such a way, that this will NOT happen for them, i.e. no matter how you implement your method you should get the same result.&lt;br /&gt;
&lt;br /&gt;
===References===&lt;br /&gt;
# [http://genome.cshlp.org/content/9/11/1106.full.pdf+html Exploring expression data: identification and analysis of coexpressed genes. LJ Heyer, S Kruglyak, S Yooseph - Genome research, 1999 - genome.cshlp.org]&lt;br /&gt;
# [https://www.chem.agilent.com/cag/bsp/products/gsgx/Downloads/pdf/qt_clustering.pdf QT clustering in industry - Agilent]&lt;br /&gt;
Peter&#039;s speed reference using 30% as the threshold:&lt;br /&gt;
 Points          Time (seconds)        Improved&lt;br /&gt;
    100             0                         0&lt;br /&gt;
   1000             2                         2&lt;br /&gt;
   3000            32                        20&lt;br /&gt;
   4169            65                        44&lt;br /&gt;
   5000           115                        70&lt;br /&gt;
   6000           149                       106&lt;br /&gt;
  10000           505                       342&lt;br /&gt;
It is not required to achieve these numbers, but it is important to have a reference - and maybe a goal.&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=QT_clustering&amp;diff=123</id>
		<title>QT clustering</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=QT_clustering&amp;diff=123"/>
		<updated>2025-04-22T12:21:31Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Details */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
The program reads a number of data points (multi-dimensional vectors) from a file and partitions those into clusters. Clustering is important in discovering patterns or modes in multi-dimensional data sets. It is also a method of organizing data examples into similar groups (clusters). In this particular case, QT clustering partitions the data set such that each example (data point) is assigned to exactly one cluster. QT clustering is superior to K-means clustering in that the number of clusters is not given beforehand and it yields the same result in repeated runs. It requires more CPU time, though.&lt;br /&gt;
&lt;br /&gt;
QT (Quality Threshold) has its name from the user-determined threshold (distance) of the maximal diameter of the clusters that the method computes.&lt;br /&gt;
&lt;br /&gt;
===Input and output===&lt;br /&gt;
The input is a tab separated file containing one data point on each line. Each data point is a vector consisting of a number of numbers. The program should handle any given vector size, but the vector size is constant in any data file. Input file example:&lt;br /&gt;
&lt;br /&gt;
 ex01	8.76	3.29	1.05&lt;br /&gt;
 ex02	12.3	2.33	3.53&lt;br /&gt;
 ex03	-0.54	-3.56	1.45&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
&lt;br /&gt;
The output is all data points, partitioned in the clusters they belong to. Output example where each cluster starts with the cluster and is proceeded by the the members of that cluster:&lt;br /&gt;
&lt;br /&gt;
 Cluster-1&lt;br /&gt;
 ex10	1.04	2.98	1.34&lt;br /&gt;
 ex12	1.23	2.34	1.69&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
 Cluster-2&lt;br /&gt;
 ex04	-0.34	3.51	9.02&lt;br /&gt;
 ex07	-8.56	5.12	12.5&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
&lt;br /&gt;
===Examples of program execution===&lt;br /&gt;
 cluster.py vectors.txt 500&lt;br /&gt;
The 500 is the maximum diameter for a cluster in the data set. An interesting twist could be to automatically decide the cluster diameter like this: X % of the distance between the two data point furthest away from each other. Called like this&lt;br /&gt;
 cluster.py vectors.txt 30%&lt;br /&gt;
&lt;br /&gt;
===Details===&lt;br /&gt;
The method works for any type of data set where it is possible to calculate a &#039;&#039;distance&#039;&#039; between any two points. In this project we are just considering euclidean distances, as they are simple.&lt;br /&gt;
[https://en.wikipedia.org/wiki/Pythagorean_theorem Pythagoras&#039;s theorem].&lt;br /&gt;
&lt;br /&gt;
The algorithm works like this.&lt;br /&gt;
# For each point in the data set, calculate the &#039;&#039;candidate cluster&#039;&#039; with that point as the starting point. With &#039;&#039;&#039;n&#039;&#039;&#039; points in the data set, there are &#039;&#039;&#039;n&#039;&#039;&#039; candidate clusters, obviously.&lt;br /&gt;
# Choose the candidate cluster that contains most points as the primary cluster and remove those points from the data set. If two or more candidate clusters have equally most points, pick the cluster with the smallest diameter. If they are still equal, pick the first you found.&lt;br /&gt;
# Repeat step 1 and 2 until there are no points in the data set or a set limit has been reached; like all remaining candidate clusters has less than, say, 10 points and are therefore not true clusters, but noise.&lt;br /&gt;
# Print the resulting clusters.&lt;br /&gt;
&lt;br /&gt;
A &#039;&#039;candidate cluster&#039;&#039; for a point is calculated using &amp;quot;complete linkage&amp;quot; like this:&lt;br /&gt;
# Consider the starting point as the beginning of the candidate cluster for that point. This is trivially seen as a subset of your data set.&lt;br /&gt;
# Add one point from your data set at a time in such a way that you extend the candidate cluster diameter the least. Again, if two points would extend the diameter the least, pick the first one you find.&lt;br /&gt;
# Continue adding points - that is repeat step 2 - to your candidate cluster until the diameter exceeds the Quality Threshold (hence the name QT clustering). The point that makes the diameter exceed the QT is not part of the candidate cluster.&lt;br /&gt;
Important definition: The diameter of a data set (or candidate cluster) is the distance of the two points furthest from each other.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: All points in the data set can participate in multiple candidate clusters. Any point is not permanently assigned to a candidate cluster, before you actually pick the largest one and remove the &amp;quot;winners&amp;quot; points from the data set.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: Building a candidate cluster according to above method is &#039;&#039;&#039;NOT&#039;&#039;&#039; the same method as just adding the nearest point to the starting point - or any point in the growing candidate cluster.&lt;br /&gt;
&lt;br /&gt;
A fairly large part of this project is optimizing the algorithm just described. This is done by gaining insight in the algorithm - not calculating what does not need to be calculated, not calculating the same thing again and again.&lt;br /&gt;
&lt;br /&gt;
Various data sets: [https://teaching.healthtech.dtu.dk/material/22113/point100.lst 100 data points], [https://teaching.healthtech.dtu.dk/material/22113/point1000.lst 1000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point3000.lst 3000 data points],&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/point4169.lst 4169 data points], [https://teaching.healthtech.dtu.dk/material/22113/point5000.lst 5000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point6000.lst 6000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point10000.lst 10000 data points].&lt;br /&gt;
&lt;br /&gt;
Checking the correctness of your program.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/QT-small.lst Result] of clustering the small list (100 points) with QT being 30% of the diameter.&amp;lt;br&amp;gt; &lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/result1000.lst Result] of clustering 1000 points with QT being 30% of the diameter.&lt;br /&gt;
&lt;br /&gt;
The algorithm is deterministic - meaning that an implementation will yield the same result on the same data set every time. However, in the description there are two places, where &amp;quot;you pick the first one&amp;quot; you find. This is implementation dependent and therefore two different implementations of QT can give rise to different results. The data sets given here are constructed in such a way, that this will NOT happen for them, i.e. no matter how you implement your method you should get the same result.&lt;br /&gt;
&lt;br /&gt;
===References===&lt;br /&gt;
# [http://genome.cshlp.org/content/9/11/1106.full.pdf+html Exploring expression data: identification and analysis of coexpressed genes. LJ Heyer, S Kruglyak, S Yooseph - Genome research, 1999 - genome.cshlp.org]&lt;br /&gt;
# [https://www.chem.agilent.com/cag/bsp/products/gsgx/Downloads/pdf/qt_clustering.pdf QT clustering in industry - Agilent]&lt;br /&gt;
Peter&#039;s speed reference using 30% as the threshold:&lt;br /&gt;
 Points          Time (seconds)        Improved&lt;br /&gt;
    100             0                         0&lt;br /&gt;
   1000             2                         2&lt;br /&gt;
   3000            32                        20&lt;br /&gt;
   4169            65                        44&lt;br /&gt;
   5000           115                        70&lt;br /&gt;
   6000           149                       106&lt;br /&gt;
  10000           505                       342&lt;br /&gt;
It is not required to achieve these numbers, but it is important to have a reference - and maybe a goal.&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=QT_clustering&amp;diff=122</id>
		<title>QT clustering</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=QT_clustering&amp;diff=122"/>
		<updated>2025-04-22T12:17:15Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
The program reads a number of data points (multi-dimensional vectors) from a file and partitions those into clusters. Clustering is important in discovering patterns or modes in multi-dimensional data sets. It is also a method of organizing data examples into similar groups (clusters). In this particular case, QT clustering partitions the data set such that each example (data point) is assigned to exactly one cluster. QT clustering is superior to K-means clustering in that the number of clusters is not given beforehand and it yields the same result in repeated runs. It requires more CPU time, though.&lt;br /&gt;
&lt;br /&gt;
QT (Quality Threshold) has its name from the user-determined threshold (distance) of the maximal diameter of the clusters that the method computes.&lt;br /&gt;
&lt;br /&gt;
===Input and output===&lt;br /&gt;
The input is a tab separated file containing one data point on each line. Each data point is a vector consisting of a number of numbers. The program should handle any given vector size, but the vector size is constant in any data file. Input file example:&lt;br /&gt;
&lt;br /&gt;
 ex01	8.76	3.29	1.05&lt;br /&gt;
 ex02	12.3	2.33	3.53&lt;br /&gt;
 ex03	-0.54	-3.56	1.45&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
&lt;br /&gt;
The output is all data points, partitioned in the clusters they belong to. Output example where each cluster starts with the cluster and is proceeded by the the members of that cluster:&lt;br /&gt;
&lt;br /&gt;
 Cluster-1&lt;br /&gt;
 ex10	1.04	2.98	1.34&lt;br /&gt;
 ex12	1.23	2.34	1.69&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
 Cluster-2&lt;br /&gt;
 ex04	-0.34	3.51	9.02&lt;br /&gt;
 ex07	-8.56	5.12	12.5&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
&lt;br /&gt;
===Examples of program execution===&lt;br /&gt;
 cluster.py vectors.txt 500&lt;br /&gt;
The 500 is the maximum diameter for a cluster in the data set. An interesting twist could be to automatically decide the cluster diameter like this: X % of the distance between the two data point furthest away from each other. Called like this&lt;br /&gt;
 cluster.py vectors.txt 30%&lt;br /&gt;
&lt;br /&gt;
===Details===&lt;br /&gt;
The method works for any type of data set where it is possible to calculate a &#039;&#039;distance&#039;&#039; between any two points. In this project we are just considering euclidean distances, as they are simple.&lt;br /&gt;
[https://en.wikipedia.org/wiki/Pythagorean_theorem Pythagoras&#039;s theorem].&lt;br /&gt;
&lt;br /&gt;
The algorithm works like this.&lt;br /&gt;
# For each point in the data set, calculate the &#039;&#039;candidate cluster&#039;&#039; with that point as the starting point. With &#039;&#039;&#039;n&#039;&#039;&#039; points in the data set, there are &#039;&#039;&#039;n&#039;&#039;&#039; candidate clusters, obviously.&lt;br /&gt;
# Choose the candidate cluster that contains most points as the primary cluster and remove those points from the data set. If two or more candidate clusters have equally most points, pick the cluster with the smallest diameter. If they are still equal, pick the first you found.&lt;br /&gt;
# Repeat step 1 and 2 until there are no points in the data set or a set limit has been reached; like all remaining candidate clusters has less than, say, 10 points and are therefore not true clusters, but noise.&lt;br /&gt;
# Print the resulting clusters.&lt;br /&gt;
&lt;br /&gt;
A &#039;&#039;candidate cluster&#039;&#039; for a point is calculated using &amp;quot;complete linkage&amp;quot; like this:&lt;br /&gt;
# Consider the starting point as the beginning of the candidate cluster for that point. This is trivially seen as a subset of your data set.&lt;br /&gt;
# Add one point from your data set at a time in such a way that you extend the candidate cluster diameter the least. Again, if two points would extend the diameter the least, pick the first one you find.&lt;br /&gt;
# Continue adding points - that is repeat step 2 - to your candidate cluster until the diameter exceeds the Quality Threshold (hence the name QT clustering). The point that makes the diameter exceed the QT is not part of the candidate cluster.&lt;br /&gt;
Important definition: The diameter of a data set (or candidate cluster) is the distance of the two points furthest from each other.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: All points in the data set can participate in multiple candidate clusters. Any point is not permanently assigned to a candidate cluster, before you actually pick the largest one and remove the &amp;quot;winners&amp;quot; points from the data set.&amp;lt;br&amp;gt;&lt;br /&gt;
Note: Building a candidate cluster according to above method is &#039;&#039;&#039;NOT&#039;&#039;&#039; the same method as just adding the nearest point to the starting point - or any point in the growing candidate cluster.&lt;br /&gt;
&lt;br /&gt;
A fairly large part of this project is optimizing the algorithm just described. This is done by gaining insight in the algorithm - not calculating what does not need to be calculated, not calculating the same thing again and again.&lt;br /&gt;
&lt;br /&gt;
Various data sets: [https://teaching.healthtech.dtu.dk/material/22113/point100.lst 100 data points], [https://teaching.healthtech.dtu.dk/material/22113/point1000.lst 1000 data points],&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/point4169.lst 4169 data points], [https://teaching.healthtech.dtu.dk/material/22113/point5000.lst 5000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point6000.lst 6000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point10000.lst 10000 data points].&lt;br /&gt;
&lt;br /&gt;
Checking the correctness of your program.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/QT-small.lst Result] of clustering the small list (100 points) with QT being 30% of the diameter.&amp;lt;br&amp;gt; &lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/result1000.lst Result] of clustering 1000 points with QT being 30% of the diameter.&lt;br /&gt;
&lt;br /&gt;
The algorithm is deterministic - meaning that an implementation will yield the same result on the same data set every time. However, in the description there are two places, where &amp;quot;you pick the first one&amp;quot; you find. This is implementation dependent and therefore two different implementations of QT can give rise to different results. The data sets given here are constructed in such a way, that this will NOT happen for them, i.e. no matter how you implement your method you should get the same result.&lt;br /&gt;
&lt;br /&gt;
===References===&lt;br /&gt;
# [http://genome.cshlp.org/content/9/11/1106.full.pdf+html Exploring expression data: identification and analysis of coexpressed genes. LJ Heyer, S Kruglyak, S Yooseph - Genome research, 1999 - genome.cshlp.org]&lt;br /&gt;
# [https://www.chem.agilent.com/cag/bsp/products/gsgx/Downloads/pdf/qt_clustering.pdf QT clustering in industry - Agilent]&lt;br /&gt;
Peter&#039;s speed reference using 30% as the threshold:&lt;br /&gt;
 Points          Time (seconds)        Improved&lt;br /&gt;
    100             0                         0&lt;br /&gt;
   1000             2                         2&lt;br /&gt;
   3000            32                        20&lt;br /&gt;
   4169            65                        44&lt;br /&gt;
   5000           115                        70&lt;br /&gt;
   6000           149                       106&lt;br /&gt;
  10000           505                       342&lt;br /&gt;
It is not required to achieve these numbers, but it is important to have a reference - and maybe a goal.&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Unit_test_-_start_of_reverse_polish_notation_class&amp;diff=121</id>
		<title>Unit test - start of reverse polish notation class</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Unit_test_-_start_of_reverse_polish_notation_class&amp;diff=121"/>
		<updated>2025-03-24T13:52:16Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;These are the two files, I made in class for unit test demo purposes.  My original class in the file: ReversePolishCalc.py &amp;lt;pre&amp;gt; class ReversePolishCalc:     def __init__(self):         self.stack = list()      def _checkstack(self, count):         if len(self.stack) &amp;lt; count:             raise IndexError(&amp;quot;Stack does not contain enough elements to perform operaation&amp;quot;)          def push(self, vector):         if isinstance(vector, (int, float, str)):             vector = [...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;These are the two files, I made in class for unit test demo purposes.&lt;br /&gt;
&lt;br /&gt;
My original class in the file: ReversePolishCalc.py&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
class ReversePolishCalc:&lt;br /&gt;
    def __init__(self):&lt;br /&gt;
        self.stack = list()&lt;br /&gt;
&lt;br /&gt;
    def _checkstack(self, count):&lt;br /&gt;
        if len(self.stack) &amp;lt; count:&lt;br /&gt;
            raise IndexError(&amp;quot;Stack does not contain enough elements to perform operaation&amp;quot;)&lt;br /&gt;
    &lt;br /&gt;
    def push(self, vector):&lt;br /&gt;
        if isinstance(vector, (int, float, str)):&lt;br /&gt;
            vector = [vector]&lt;br /&gt;
        if not isinstance(vector, (list, tuple)):&lt;br /&gt;
            raise ValueError(&amp;quot;Input can not be understood as numbers&amp;quot;)&lt;br /&gt;
        for number in vector:&lt;br /&gt;
            if isinstance(number, (int, float)):&lt;br /&gt;
                self.stack.append(number)&lt;br /&gt;
            elif isinstance(number, str):&lt;br /&gt;
                try:&lt;br /&gt;
                    self.stack.append(int(number))&lt;br /&gt;
                except ValueError:&lt;br /&gt;
                    try:&lt;br /&gt;
                        self.stack.append(float(number))&lt;br /&gt;
                    except ValueError:&lt;br /&gt;
                        raise ValueError(&amp;quot;Input can not be understood as numbers&amp;quot;)&lt;br /&gt;
            else:&lt;br /&gt;
                raise ValueError(&amp;quot;Input can not be understood as numbers&amp;quot;)&lt;br /&gt;
    &lt;br /&gt;
    def pop(self):&lt;br /&gt;
        self._checkstack(1)&lt;br /&gt;
        return self.stack.pop()&lt;br /&gt;
        &lt;br /&gt;
    def add(self):&lt;br /&gt;
        self._checkstack(2)&lt;br /&gt;
        self.stack[-2] += self.stack[-1]&lt;br /&gt;
        del self.stack[-1]&lt;br /&gt;
&lt;br /&gt;
    def subtract(self):&lt;br /&gt;
        self._checkstack(2)&lt;br /&gt;
        self.stack[-2] -= self.stack[-1]&lt;br /&gt;
        del self.stack[-1]&lt;br /&gt;
&lt;br /&gt;
    def multiply(self):&lt;br /&gt;
        self._checkstack(2)&lt;br /&gt;
        self.stack[-2] *= self.stack[-1]&lt;br /&gt;
        del self.stack[-1]&lt;br /&gt;
&lt;br /&gt;
    def divide(self):&lt;br /&gt;
        self._checkstack(2)&lt;br /&gt;
        if self.stack[-1] == 0:&lt;br /&gt;
            raise ZeroDivisionError&lt;br /&gt;
        self.stack[-2] /= self.stack[-1]&lt;br /&gt;
        del self.stack[-1]&lt;br /&gt;
&lt;br /&gt;
    def factorial(self):&lt;br /&gt;
        self._checkstack(1)&lt;br /&gt;
        no = int(self.stack[-1])&lt;br /&gt;
        if no != self.stack[-1]:&lt;br /&gt;
            raise ValueError(&amp;quot;Factorial with floats is invalid&amp;quot;)&lt;br /&gt;
        if no &amp;lt; 0:&lt;br /&gt;
            raise ValueError(&amp;quot;Factorial can not be calcuated with negatives&amp;quot;)&lt;br /&gt;
        res = 1&lt;br /&gt;
        for i in range(2, no+1):&lt;br /&gt;
            res *= i        &lt;br /&gt;
        self.stack[-1] = res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
My test file of the class: test_ReversePolishCalc.py&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
import pytest&lt;br /&gt;
from ReversePolishCalc import ReversePolishCalc as rpc&lt;br /&gt;
&lt;br /&gt;
def test_push1():&lt;br /&gt;
    # Arrange&lt;br /&gt;
    calc = rpc()&lt;br /&gt;
    # Act&lt;br /&gt;
    calc.push(1)&lt;br /&gt;
    # Assert&lt;br /&gt;
    assert calc.stack[-1] == 1, &amp;quot;Simple push of integer 1&amp;quot;&lt;br /&gt;
&lt;br /&gt;
def test_push2():&lt;br /&gt;
    # Arrange&lt;br /&gt;
    calc = rpc()&lt;br /&gt;
    # Act&lt;br /&gt;
    calc.push(1)&lt;br /&gt;
    # Assert&lt;br /&gt;
    assert calc.stack[0] == 1, &amp;quot;Simple push of integer 2&amp;quot;&lt;br /&gt;
&lt;br /&gt;
def test_push3():&lt;br /&gt;
    # Arrange&lt;br /&gt;
    calc = rpc()&lt;br /&gt;
    # Act&lt;br /&gt;
    calc.push(1.2)&lt;br /&gt;
    # Assert&lt;br /&gt;
    assert calc.stack[0] == 1.2, &amp;quot;Push of float&amp;quot;&lt;br /&gt;
&lt;br /&gt;
def test_push4():&lt;br /&gt;
    # Arrange&lt;br /&gt;
    calc = rpc()&lt;br /&gt;
    # Act&lt;br /&gt;
    calc.push(&amp;quot;1&amp;quot;)&lt;br /&gt;
    # Assert&lt;br /&gt;
    assert calc.stack[0] == 1, &amp;quot;Push of 1 as string&amp;quot;&lt;br /&gt;
&lt;br /&gt;
def test_push5():&lt;br /&gt;
    # Arrange&lt;br /&gt;
    calc = rpc()&lt;br /&gt;
    # Act&lt;br /&gt;
    calc.push([1, 1.5, &amp;quot;2.5&amp;quot;])&lt;br /&gt;
    # Assert&lt;br /&gt;
    assert calc.stack == [1, 1.5, 2.5], &amp;quot;Advanced list push of ints, floats and strings&amp;quot;&lt;br /&gt;
&lt;br /&gt;
@pytest.mark.parametrize(&amp;quot;x,y&amp;quot;, [(1,1), (1.2, 1.2), (&amp;quot;1.2&amp;quot;, 1.2), (&amp;quot;1&amp;quot;, 1)])&lt;br /&gt;
def test_push(x,y):&lt;br /&gt;
    # Arrange&lt;br /&gt;
    calc = rpc()&lt;br /&gt;
    # Act&lt;br /&gt;
    calc.push(x)&lt;br /&gt;
    # Assert&lt;br /&gt;
    assert calc.stack[0] == y, &amp;quot;Push of 1 as string&amp;quot;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
def test_pop1():&lt;br /&gt;
    # Arrange&lt;br /&gt;
    calc = rpc()&lt;br /&gt;
    calc.push(1)&lt;br /&gt;
    # Act&lt;br /&gt;
    num = calc.pop()&lt;br /&gt;
    # Assert&lt;br /&gt;
    assert num == 1&lt;br /&gt;
&lt;br /&gt;
def test_failpop1():&lt;br /&gt;
    # Arrange&lt;br /&gt;
    calc = rpc()&lt;br /&gt;
    # Act&lt;br /&gt;
    with pytest.raises(IndexError):&lt;br /&gt;
        num = calc.pop()&lt;br /&gt;
    # Assert&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Unit_test&amp;diff=120</id>
		<title>Unit test</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Unit_test&amp;diff=120"/>
		<updated>2025-03-24T13:48:12Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Required course material for the lesson */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
{| width=500  style=&amp;quot;font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;&amp;quot;&lt;br /&gt;
|Previous: [[Classes]]&lt;br /&gt;
|Next: [[Scientific Libraries, Pandas, Numpy]]&lt;br /&gt;
|}&lt;br /&gt;
== Required course material for the lesson ==&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22113/22113_08-Testing.ppt Testing]&amp;lt;br&amp;gt;&lt;br /&gt;
Online: [https://docs.pytest.org/ pytest documentation]&amp;lt;br&amp;gt;&lt;br /&gt;
Resource: [[Example code - Unit test]]&amp;lt;br&amp;gt;&lt;br /&gt;
Resource: [[Unit test - start of reverse polish notation class]]&amp;lt;br&amp;gt;&lt;br /&gt;
Blog: [https://www.joelonsoftware.com/2000/04/30/top-five-wrong-reasons-you-dont-have-testers/ On testing], by the founder of StackExchange.&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Subjects covered ==&lt;br /&gt;
Overview of test methods&amp;lt;br&amp;gt;&lt;br /&gt;
Unit test using pytest framework.&lt;br /&gt;
&lt;br /&gt;
== Exercises to be handed in ==&lt;br /&gt;
You should make a special folder for the exercises. I will refer to my special folder as &#039;&#039;unittest&#039;&#039; in these exercises. You will also see some &#039;&#039;__pycache__&#039;&#039; folders appear in places. This is Pythons cache for &amp;quot;compiled&amp;quot; programs. It is safe to ignore and also to delete, because it may become outdated.&lt;br /&gt;
# Use your factorial function from exercise 2 in [[Making Functions]]. If you did not do so already, change it to use exceptions instead of &#039;&#039;&#039;sys.exit()&#039;&#039;&#039;, when an error occurs. Now make simple unit tests for the following test cases: 12, 2, 1, 0, -1, 3.0, 3.4, &amp;quot;3&amp;quot;, &amp;quot;3.1.&amp;quot;, &amp;quot;ABC&amp;quot;. The factorial function and all test functions must be in one single file (&#039;&#039;factorial_test.py&#039;&#039; in &#039;&#039;unittest&#039;&#039;), which you can run &#039;&#039;pytest&#039;&#039; on.&lt;br /&gt;
# Now remove the factorial function from &#039;&#039;factorial_test.py&#039;&#039; and put it in its own file &#039;&#039;factorial.py&#039;&#039;. Import it from the &#039;&#039;factorial_test.py&#039;&#039; like &#039;&#039;&#039;from factorial import factorial&#039;&#039;&#039;. The first factorial is the name of the .py file, the second factorial is the name of your factorial function. Just run &#039;&#039;pytest&#039;&#039; (no file name) in the folder to check it works. It is more normal to have test and function separated.&lt;br /&gt;
# Above we removed test code from function code by creating two files. Next, put the files in their own folder in &#039;&#039;unittest&#039;&#039;. I would put my &#039;&#039;factorial.py&#039;&#039; in &#039;&#039;unittest/src&#039;&#039; and &#039;&#039;factorial_test.py&#039;&#039; in &#039;&#039;unittest/test&#039;&#039;. This way there is a very clear separation between function and test. The problem is making sure the test code loads the function code. Do it wrong a couple of times - it is very instructive.&lt;br /&gt;
# Follow the file structure of having a &#039;&#039;code&#039;&#039; (or &#039;&#039;src&#039;&#039;) folder for programs, a &#039;&#039;test&#039;&#039; folder for tests, and now a &#039;&#039;testdata&#039;&#039; folder for files containing test data. Now make unit tests and appropriate test data files for your &#039;&#039;&#039;fasta&#039;&#039;&#039; class from last week. In this exercise you just need to make unit test for the method &#039;&#039;&#039;load&#039;&#039;&#039;. You need to hand in both tests and test data. Maybe you should zip it all. Learn to zip :-)&lt;br /&gt;
# Add unit tests for the method &#039;&#039;&#039;save&#039;&#039;&#039; in your &#039;&#039;&#039;fasta&#039;&#039;&#039; class.&lt;br /&gt;
# Add unit tests for the method &#039;&#039;&#039;delete&#039;&#039;&#039; in your &#039;&#039;&#039;fasta&#039;&#039;&#039; class.&lt;br /&gt;
I would not be surprised if you find errors in your &#039;&#039;&#039;fasta&#039;&#039;&#039; class based on these tests. I found flaws in my code.&lt;br /&gt;
&lt;br /&gt;
== Exercises for extra practice ==&lt;br /&gt;
# Add unit tests for all methods in your &#039;&#039;&#039;fasta&#039;&#039;&#039; class. That will be a bit of work.&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Classes&amp;diff=119</id>
		<title>Classes</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Classes&amp;diff=119"/>
		<updated>2025-03-17T17:24:50Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Exercises to be handed in */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
{| width=500  style=&amp;quot;font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;&amp;quot;&lt;br /&gt;
|Previous: [[Comprehension, Generators, Functions and Methods]]&lt;br /&gt;
|Next: [[Unit test]]&lt;br /&gt;
|}&lt;br /&gt;
== Required course material for the lesson ==&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22113/22113_07-Classes.ppt Classes]&amp;lt;br&amp;gt;&lt;br /&gt;
Resource: [[Example code - Classes]]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Subjects covered ==&lt;br /&gt;
Classes&amp;lt;br&amp;gt;&lt;br /&gt;
Object Oriented Programming&lt;br /&gt;
&lt;br /&gt;
== Exercises to be handed in ==&lt;br /&gt;
The exercises will be about the making of a class. A the exercises goes on the class can do more and more. Remember to add code outside the class, which test the new methods you make as you make them. The class methods need some decent input control and you need to think about what is required. Notice how versatile the class becomes, as you progress through the exercises.&lt;br /&gt;
# Create a class called &#039;&#039;&#039;Fasta&#039;&#039;&#039;. The purpose of the class is to do various kinds of manipulation of fasta files and sequences. Start with adding 3 class methods:&amp;lt;br&amp;gt;&#039;&#039;&#039;load(filename)&#039;&#039;&#039;, which gets a file name/path and loads the fasta file into internal instance lists with header and sequence.&amp;lt;br&amp;gt;&#039;&#039;&#039;save(filename)&#039;&#039;&#039;, which writes the internal instance header/sequence lists into a fasta file.&amp;lt;br&amp;gt;&#039;&#039;&#039;content()&#039;&#039;&#039;, which returns two lists; the headers and the sequences.&amp;lt;br&amp;gt;You can likely reuse part of your functions in exercise 3 and 4 from [[Making Functions]].&amp;lt;br&amp;gt;Example use of the Fasta class:&amp;lt;br&amp;gt;&amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;myfasta = Fasta()&amp;lt;br&amp;gt;myfasta.load(&amp;quot;dna7.fsa&amp;quot;)&amp;lt;br&amp;gt;print(myfasta.content())&amp;lt;br&amp;gt;myfasta.save(&amp;quot;newfile.fsa&amp;quot;)&amp;lt;/div&amp;gt;&lt;br /&gt;
# Add a method &#039;&#039;&#039;delete([start[,end]])&#039;&#039;&#039;, which deletes entries in the headers and sequences. If called with no arguments like &#039;&#039;&#039;delete()&#039;&#039;&#039; it deletes all headers/sequences. If called with one argument like &#039;&#039;&#039;delete(start)&#039;&#039;&#039; it deletes the header and sequence at position &#039;&#039;start&#039;&#039;. If called with two arguments, it deletes headers and sequences from position &#039;&#039;start&#039;&#039; up to but not including position &#039;&#039;end&#039;&#039;. &#039;&#039;start&#039;&#039; and &#039;&#039;end&#039;&#039; can also be negative numbers, and in such case we count from the end of the lists, just like it works in normal list manipulations.&lt;br /&gt;
# Modify the &#039;&#039;&#039;content()&#039;&#039;&#039; method, so it works similar to &#039;&#039;&#039;delete()&#039;&#039;&#039;. If called with no arguments like &#039;&#039;&#039;content()&#039;&#039;&#039; it returns 2 lists with all headers and sequences. If called with one argument like &#039;&#039;&#039;content(start)&#039;&#039;&#039; it returns the header and sequence at position &#039;&#039;start&#039;&#039;. If called with two arguments, it returns the headers and sequences from position &#039;&#039;start&#039;&#039; up to but not including position &#039;&#039;end&#039;&#039; as two lists. Be careful that you return a copy of the lists and not the lists themselves as then the headers and sequences in the instance can be modified unintended outside the instance.&lt;br /&gt;
# Add a method &#039;&#039;&#039;insert(header,sequence[,position])&#039;&#039;&#039;, which adds header and sequence to the instance lists. &#039;&#039;header&#039;&#039; and &#039;&#039;sequence&#039;&#039; can either be simple strings (single header and sequence) or lists of headers and sequences. If &#039;&#039;position&#039;&#039; is not given, then the addition takes place at the end of the existing headers/sequences. If &#039;&#039;position&#039;&#039; is given then insertion takes places at that position.&lt;br /&gt;
# Add a method &#039;&#039;&#039;verify(alphabet,[start[,end]])&#039;&#039;&#039;, which verifies sequence entries according to an alphabet. It works in a way similar to &#039;&#039;&#039;delete()&#039;&#039;&#039; and &#039;&#039;&#039;content()&#039;&#039;&#039; for the range of what should be verified. The method returns True if all entries in the range are verified (belonging to the alphabet), False otherwise.&amp;lt;br&amp;gt;The alphabet could be a DNA alphabet (ATCG), a protein alphabet or something derived from those. You should put your alphabets into class variables, as they are common for all instances, and not subject to change.&lt;br /&gt;
# Add a method &#039;&#039;&#039;discard(alphabet,[start[,end]])&#039;&#039;&#039;, which discards sequence entries and corresponding headers if they do not match the given alphabet. It works in a way similar to &#039;&#039;&#039;delete()&#039;&#039;&#039; and &#039;&#039;&#039;content()&#039;&#039;&#039; for the range of what should be verified.&lt;br /&gt;
# Add iteration and length evaluation (__len__ magic method) to the &#039;&#039;&#039;Fasta&#039;&#039;&#039; class using magic methods. Do this so you can write code like &amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;if len(MyFastaInstance) &amp;gt; 0:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;for header, sequence in MyFastaInstance:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;# Do something with header and sequence&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;print(header, sequence)&amp;lt;/div&amp;gt;The function &#039;&#039;&#039;len&#039;&#039;&#039; returns the number of sequences.&lt;br /&gt;
# Add the methods &#039;&#039;&#039;deletethis()&#039;&#039;&#039;, &#039;&#039;&#039;insertthis(header, sequence)&#039;&#039;&#039;, &#039;&#039;&#039;verifythis(alphabet)&#039;&#039;&#039; and &#039;&#039;&#039;discardthis(alphabet)&#039;&#039;&#039; to the &#039;&#039;&#039;Fasta&#039;&#039;&#039; class. The methods should only work when &#039;&#039;iterating over an instance at the current item&#039;&#039;, i.e. they work when you are iterating over the fasta sequences on the &#039;&#039;current&#039;&#039; sequence and header, like this:&amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;for header, sequence in MyFastaInstance:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;if not MyFastaInstance.verifythis(&amp;quot;DNA&amp;quot;):&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;MyFastaInstance.deletethis()&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;continue&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;# Do something with header and sequence&amp;lt;/div&amp;gt;As some may remember, it is normally impossible to successfully iterate straightforward through a list and delete and/or add elements to the list during the iteration. You have to make this possible, maybe by changing the way your iteration works in the previous exercise.&lt;br /&gt;
&lt;br /&gt;
== Exercises for extra practice ==&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Classes&amp;diff=118</id>
		<title>Classes</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Classes&amp;diff=118"/>
		<updated>2025-03-17T17:22:56Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Exercises to be handed in */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
{| width=500  style=&amp;quot;font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;&amp;quot;&lt;br /&gt;
|Previous: [[Comprehension, Generators, Functions and Methods]]&lt;br /&gt;
|Next: [[Unit test]]&lt;br /&gt;
|}&lt;br /&gt;
== Required course material for the lesson ==&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22113/22113_07-Classes.ppt Classes]&amp;lt;br&amp;gt;&lt;br /&gt;
Resource: [[Example code - Classes]]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Subjects covered ==&lt;br /&gt;
Classes&amp;lt;br&amp;gt;&lt;br /&gt;
Object Oriented Programming&lt;br /&gt;
&lt;br /&gt;
== Exercises to be handed in ==&lt;br /&gt;
The exercises will be about the making of a class. A the exercises goes on the class can do more and more. Remember to add code outside the class, which test the new methods you make as you make them. The class methods need some decent input control and you need to think about what is required. Notice how versatile the class becomes, as you progress through the exercises.&lt;br /&gt;
# Create a class called &#039;&#039;&#039;Fasta&#039;&#039;&#039;. The purpose of the class is to do various kinds of manipulation of fasta files and sequences. Start with adding 3 class methods:&amp;lt;br&amp;gt;&#039;&#039;&#039;load(filename)&#039;&#039;&#039;, which gets a file name/path and loads the fasta file into internal instance lists with header and sequence.&amp;lt;br&amp;gt;&#039;&#039;&#039;save(filename)&#039;&#039;&#039;, which writes the internal instance header/sequence lists into a fasta file.&amp;lt;br&amp;gt;&#039;&#039;&#039;content()&#039;&#039;&#039;, which returns two lists; the headers and the sequences.&amp;lt;br&amp;gt;You can likely reuse part of your functions in exercise 3 and 4 from [[Making Functions]].&amp;lt;br&amp;gt;Example use of the Fasta class:&amp;lt;br&amp;gt;&amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;myfasta = Fasta()&amp;lt;br&amp;gt;myfasta.load(&amp;quot;dna7.fsa&amp;quot;)&amp;lt;br&amp;gt;print(myfasta.content())&amp;lt;br&amp;gt;myfasta.save(&amp;quot;newfile.fsa&amp;quot;)&amp;lt;/div&amp;gt;&lt;br /&gt;
# Add a method &#039;&#039;&#039;delete([start[,end]])&#039;&#039;&#039;, which deletes entries in the headers and sequences. If called with no arguments like &#039;&#039;&#039;delete()&#039;&#039;&#039; it deletes all headers/sequences. If called with one argument like &#039;&#039;&#039;delete(start)&#039;&#039;&#039; it deletes the header and sequence at position &#039;&#039;start&#039;&#039;. If called with two arguments, it deletes headers and sequences from position &#039;&#039;start&#039;&#039; up to but not including position &#039;&#039;end&#039;&#039;. &#039;&#039;start&#039;&#039; and &#039;&#039;end&#039;&#039; can also be negative numbers, and in such case we count from the end of the lists, just like it works in normal list manipulations.&lt;br /&gt;
# Modify the &#039;&#039;&#039;content()&#039;&#039;&#039; method, so it works similar to &#039;&#039;&#039;delete()&#039;&#039;&#039;. If called with no arguments like &#039;&#039;&#039;content()&#039;&#039;&#039; it returns 2 lists with all headers and sequences. If called with one argument like &#039;&#039;&#039;content(start)&#039;&#039;&#039; it returns the header and sequence at position &#039;&#039;start&#039;&#039;. If called with two arguments, it returns the headers and sequences from position &#039;&#039;start&#039;&#039; up to but not including position &#039;&#039;end&#039;&#039; as two lists. Be careful that you return a copy of the lists and not the lists themselves as then the headers and sequences in the instance can be modified unintended outside the instance.&lt;br /&gt;
# Add a method &#039;&#039;&#039;insert(header,sequence[,position])&#039;&#039;&#039;, which adds header and sequence to the instance lists. &#039;&#039;header&#039;&#039; and &#039;&#039;sequence&#039;&#039; can either be simple strings (single header and sequence) or lists of headers and sequences. If &#039;&#039;position&#039;&#039; is not given, then the addition takes place at the end of the existing headers/sequences. If &#039;&#039;position&#039;&#039; is given then insertion takes places at that position.&lt;br /&gt;
# Add a method &#039;&#039;&#039;verify(alphabet,[start[,end]])&#039;&#039;&#039;, which verifies sequence entries according to an alphabet. It works in a way similar to &#039;&#039;&#039;delete()&#039;&#039;&#039; and &#039;&#039;&#039;content()&#039;&#039;&#039; for the range of what should be verified. The method returns True if all entries in the range are verified (belonging to the alphabet), False otherwise.&amp;lt;br&amp;gt;The alphabet could be a DNA alphabet (ATCG), a protein alphabet or something derived from those. You should put your alphabets into class variables, as they are common for all instances, and not subject to change.&lt;br /&gt;
# Add a method &#039;&#039;&#039;discard(alphabet,[start[,end]])&#039;&#039;&#039;, which discards sequence entries and corresponding headers if they do not match the given alphabet. It works in a way similar to &#039;&#039;&#039;delete()&#039;&#039;&#039; and &#039;&#039;&#039;content()&#039;&#039;&#039; for the range of what should be verified.&lt;br /&gt;
# Add iteration and length evaluation to the &#039;&#039;&#039;Fasta&#039;&#039;&#039; class using magic methods. Do this so you can write code like &amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;if len(MyFastaInstance) &amp;gt; 0:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;for header, sequence in MyFastaInstance:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;# Do something with header and sequence&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;print(header, sequence)&amp;lt;/div&amp;gt;The function &#039;&#039;&#039;len&#039;&#039; returns the number of sequences.&lt;br /&gt;
# Add the methods &#039;&#039;&#039;deletethis()&#039;&#039;&#039;, &#039;&#039;&#039;insertthis(header, sequence)&#039;&#039;&#039;, &#039;&#039;&#039;verifythis(alphabet)&#039;&#039;&#039; and &#039;&#039;&#039;discardthis(alphabet)&#039;&#039;&#039; to the &#039;&#039;&#039;Fasta&#039;&#039;&#039; class. The methods should only work when &#039;&#039;iterating over an instance at the current item&#039;&#039;, i.e. they work when you are iterating over the fasta sequences on the &#039;&#039;current&#039;&#039; sequence and header, like this:&amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;for header, sequence in MyFastaInstance:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;if not MyFastaInstance.verifythis(&amp;quot;DNA&amp;quot;):&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;MyFastaInstance.deletethis()&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;continue&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;# Do something with header and sequence&amp;lt;/div&amp;gt;As some may remember, it is normally impossible to successfully iterate straightforward through a list and delete and/or add elements to the list during the iteration. You have to make this possible, maybe by changing the way your iteration works in the previous exercise.&lt;br /&gt;
&lt;br /&gt;
== Exercises for extra practice ==&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Classes&amp;diff=117</id>
		<title>Classes</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Classes&amp;diff=117"/>
		<updated>2025-03-17T17:20:44Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Exercises to be handed in */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
{| width=500  style=&amp;quot;font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;&amp;quot;&lt;br /&gt;
|Previous: [[Comprehension, Generators, Functions and Methods]]&lt;br /&gt;
|Next: [[Unit test]]&lt;br /&gt;
|}&lt;br /&gt;
== Required course material for the lesson ==&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22113/22113_07-Classes.ppt Classes]&amp;lt;br&amp;gt;&lt;br /&gt;
Resource: [[Example code - Classes]]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Subjects covered ==&lt;br /&gt;
Classes&amp;lt;br&amp;gt;&lt;br /&gt;
Object Oriented Programming&lt;br /&gt;
&lt;br /&gt;
== Exercises to be handed in ==&lt;br /&gt;
The exercises will be about the making of a class. A the exercises goes on the class can do more and more. Remember to add code outside the class, which test the new methods you make as you make them. The class methods need some decent input control and you need to think about what is required. Notice how versatile the class becomes, as you progress through the exercises.&lt;br /&gt;
# Create a class called &#039;&#039;&#039;Fasta&#039;&#039;&#039;. The purpose of the class is to do various kinds of manipulation of fasta files and sequences. Start with adding 3 class methods:&amp;lt;br&amp;gt;&#039;&#039;&#039;load(filename)&#039;&#039;&#039;, which gets a file name/path and loads the fasta file into internal instance lists with header and sequence.&amp;lt;br&amp;gt;&#039;&#039;&#039;save(filename)&#039;&#039;&#039;, which writes the internal instance header/sequence lists into a fasta file.&amp;lt;br&amp;gt;&#039;&#039;&#039;content()&#039;&#039;&#039;, which returns two lists; the headers and the sequences.&amp;lt;br&amp;gt;You can likely reuse part of your functions in exercise 3 and 4 from [[Making Functions]].&amp;lt;br&amp;gt;Example use of the Fasta class:&amp;lt;br&amp;gt;&amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;myfasta = Fasta()&amp;lt;br&amp;gt;myfasta.load(&amp;quot;dna7.fsa&amp;quot;)&amp;lt;br&amp;gt;print(myfasta.content())&amp;lt;br&amp;gt;myfasta.save(&amp;quot;newfile.fsa&amp;quot;)&amp;lt;/div&amp;gt;&lt;br /&gt;
# Add a method &#039;&#039;&#039;delete([start[,end]])&#039;&#039;&#039;, which deletes entries in the headers and sequences. If called with no arguments like &#039;&#039;&#039;delete()&#039;&#039;&#039; it deletes all headers/sequences. If called with one argument like &#039;&#039;&#039;delete(start)&#039;&#039;&#039; it deletes the header and sequence at position &#039;&#039;start&#039;&#039;. If called with two arguments, it deletes headers and sequences from position &#039;&#039;start&#039;&#039; up to but not including position &#039;&#039;end&#039;&#039;. &#039;&#039;start&#039;&#039; and &#039;&#039;end&#039;&#039; can also be negative numbers, and in such case we count from the end of the lists, just like it works in normal list manipulations.&lt;br /&gt;
# Modify the &#039;&#039;&#039;content()&#039;&#039;&#039; method, so it works similar to &#039;&#039;&#039;delete()&#039;&#039;&#039;. If called with no arguments like &#039;&#039;&#039;content()&#039;&#039;&#039; it returns 2 lists with all headers and sequences. If called with one argument like &#039;&#039;&#039;content(start)&#039;&#039;&#039; it returns the header and sequence at position &#039;&#039;start&#039;&#039;. If called with two arguments, it returns the headers and sequences from position &#039;&#039;start&#039;&#039; up to but not including position &#039;&#039;end&#039;&#039; as two lists. Be careful that you return a copy of the lists and not the lists themselves as then the headers and sequences in the instance can be modified unintended outside the instance.&lt;br /&gt;
# Add a method &#039;&#039;&#039;insert(header,sequence[,position])&#039;&#039;&#039;, which adds header and sequence to the instance lists. &#039;&#039;header&#039;&#039; and &#039;&#039;sequence&#039;&#039; can either be simple strings (single header and sequence) or lists of headers and sequences. If &#039;&#039;position&#039;&#039; is not given, then the addition takes place at the end of the existing headers/sequences. If &#039;&#039;position&#039;&#039; is given then insertion takes places at that position.&lt;br /&gt;
# Add a method &#039;&#039;&#039;verify(alphabet,[start[,end]])&#039;&#039;&#039;, which verifies sequence entries according to an alphabet. It works in a way similar to &#039;&#039;&#039;delete()&#039;&#039;&#039; and &#039;&#039;&#039;content()&#039;&#039;&#039; for the range of what should be verified. The method returns True if all entries in the range are verified (belonging to the alphabet), False otherwise.&amp;lt;br&amp;gt;The alphabet could be a DNA alphabet (ATCG), a protein alphabet or something derived from those. You should put your alphabets into class variables, as they are common for all instances, and not subject to change.&lt;br /&gt;
# Add a method &#039;&#039;&#039;discard(alphabet,[start[,end]])&#039;&#039;&#039;, which discards sequence entries and corresponding headers if they do not match the given alphabet. It works in a way similar to &#039;&#039;&#039;delete()&#039;&#039;&#039; and &#039;&#039;&#039;content()&#039;&#039;&#039; for the range of what should be verified.&lt;br /&gt;
# Add iteration and length evaluation to the &#039;&#039;&#039;Fasta&#039;&#039;&#039; class using magic methods. Do this so you can write code like &amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;if len(MyFastaInstance) &amp;gt; 0:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;for header, sequence in MyFastaInstance:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;# Do something with header and sequence&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;print(header, sequence)&amp;lt;/div&amp;gt;The method &#039;&#039;&#039;len&#039;&#039; returns the number of sequences.&lt;br /&gt;
# Add the methods &#039;&#039;&#039;deletethis()&#039;&#039;&#039;, &#039;&#039;&#039;insertthis(header, sequence)&#039;&#039;&#039;, &#039;&#039;&#039;verifythis(alphabet)&#039;&#039;&#039; and &#039;&#039;&#039;discardthis(alphabet)&#039;&#039;&#039; to the &#039;&#039;&#039;Fasta&#039;&#039;&#039; class. The methods should only work when &#039;&#039;iterating over an instance at the current item&#039;&#039;, i.e. they work when you are iterating over the fasta sequences on the &#039;&#039;current&#039;&#039; sequence and header, like this:&amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;for header, sequence in MyFastaInstance:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;if not MyFastaInstance.verifythis(&amp;quot;DNA&amp;quot;):&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;MyFastaInstance.deletethis()&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;continue&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;# Do something with header and sequence&amp;lt;/div&amp;gt;As some may remember, it is normally impossible to successfully iterate straightforward through a list and delete and/or add elements to the list during the iteration. You have to make this possible, maybe by changing the way your iteration works in the previous exercise.&lt;br /&gt;
&lt;br /&gt;
== Exercises for extra practice ==&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Classes&amp;diff=116</id>
		<title>Classes</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Classes&amp;diff=116"/>
		<updated>2025-03-17T17:17:38Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Exercises to be handed in */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
{| width=500  style=&amp;quot;font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;&amp;quot;&lt;br /&gt;
|Previous: [[Comprehension, Generators, Functions and Methods]]&lt;br /&gt;
|Next: [[Unit test]]&lt;br /&gt;
|}&lt;br /&gt;
== Required course material for the lesson ==&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22113/22113_07-Classes.ppt Classes]&amp;lt;br&amp;gt;&lt;br /&gt;
Resource: [[Example code - Classes]]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Subjects covered ==&lt;br /&gt;
Classes&amp;lt;br&amp;gt;&lt;br /&gt;
Object Oriented Programming&lt;br /&gt;
&lt;br /&gt;
== Exercises to be handed in ==&lt;br /&gt;
The exercises will be about the making of a class. A the exercises goes on the class can do more and more. Remember to add code outside the class, which test the new methods you make as you make them. The class methods need some decent input control and you need to think about what is required. Notice how versatile the class becomes, as you progress through the exercises.&lt;br /&gt;
# Create a class called &#039;&#039;&#039;Fasta&#039;&#039;&#039;. The purpose of the class is to do various kinds of manipulation of fasta files and sequences. Start with adding 3 class methods:&amp;lt;br&amp;gt;&#039;&#039;&#039;load(filename)&#039;&#039;&#039;, which gets a file name/path and loads the fasta file into internal instance lists with header and sequence.&amp;lt;br&amp;gt;&#039;&#039;&#039;save(filename)&#039;&#039;&#039;, which writes the internal instance header/sequence lists into a fasta file.&amp;lt;br&amp;gt;&#039;&#039;&#039;content()&#039;&#039;&#039;, which returns two lists; the headers and the sequences.&amp;lt;br&amp;gt;You can likely reuse part of your functions in exercise 3 and 4 from [[Making Functions]].&amp;lt;br&amp;gt;Example use of the Fasta class:&amp;lt;br&amp;gt;&amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;myfasta = Fasta()&amp;lt;br&amp;gt;myfasta.load(&amp;quot;dna7.fsa&amp;quot;)&amp;lt;br&amp;gt;print(myfasta.content())&amp;lt;br&amp;gt;myfasta.save(&amp;quot;newfile.fsa&amp;quot;)&amp;lt;/div&amp;gt;&lt;br /&gt;
# Add a method &#039;&#039;&#039;delete([start[,end]])&#039;&#039;&#039;, which deletes entries in the headers and sequences. If called with no arguments like &#039;&#039;&#039;delete()&#039;&#039;&#039; it deletes all headers/sequences. If called with one argument like &#039;&#039;&#039;delete(start)&#039;&#039;&#039; it deletes the header and sequence at position &#039;&#039;start&#039;&#039;. If called with two arguments, it deletes headers and sequences from position &#039;&#039;start&#039;&#039; up to but not including position &#039;&#039;end&#039;&#039;. &#039;&#039;start&#039;&#039; and &#039;&#039;end&#039;&#039; can also be negative numbers, and in such case we count from the end of the lists, just like it works in normal list manipulations.&lt;br /&gt;
# Modify the &#039;&#039;&#039;content()&#039;&#039;&#039; method, so it works similar to &#039;&#039;&#039;delete()&#039;&#039;&#039;. If called with no arguments like &#039;&#039;&#039;content()&#039;&#039;&#039; it returns 2 lists with all headers and sequences. If called with one argument like &#039;&#039;&#039;content(start)&#039;&#039;&#039; it returns the header and sequence at position &#039;&#039;start&#039;&#039;. If called with two arguments, it returns the headers and sequences from position &#039;&#039;start&#039;&#039; up to but not including position &#039;&#039;end&#039;&#039; as two lists. Be careful that you return a copy of the lists and not the lists themselves as then the headers and sequences in the instance can be modified unintended outside the instance.&lt;br /&gt;
# Add a method &#039;&#039;&#039;insert(header,sequence[,position])&#039;&#039;&#039;, which adds header and sequence to the instance lists. &#039;&#039;header&#039;&#039; and &#039;&#039;sequence&#039;&#039; can either be simple strings (single header and sequence) or lists of headers and sequences. If &#039;&#039;position&#039;&#039; is not given, then the addition takes place at the end of the existing headers/sequences. If &#039;&#039;position&#039;&#039; is given then insertion takes places at that position.&lt;br /&gt;
# Add a method &#039;&#039;&#039;verify(alphabet,[start[,end]])&#039;&#039;&#039;, which verifies sequence entries according to an alphabet. It works in a way similar to &#039;&#039;&#039;delete()&#039;&#039;&#039; and &#039;&#039;&#039;content()&#039;&#039;&#039; for the range of what should be verified. The method returns True if all entries in the range are verified (belonging to the alphabet), False otherwise.&amp;lt;br&amp;gt;The alphabet could be a DNA alphabet (ATCG), a protein alphabet or something derived from those. You should put your alphabets into class variables, as they are common for all instances, and not subject to change.&lt;br /&gt;
# Add a method &#039;&#039;&#039;discard(alphabet,[start[,end]])&#039;&#039;&#039;, which discards sequence entries and corresponding headers if they do not match the given alphabet. It works in a way similar to &#039;&#039;&#039;delete()&#039;&#039;&#039; and &#039;&#039;&#039;content()&#039;&#039;&#039; for the range of what should be verified.&lt;br /&gt;
# Add iteration and length evaluation to the &#039;&#039;&#039;Fasta&#039;&#039;&#039; class using magic methods. Do this so you can write code like &amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;if len(MyFastaInstance) &amp;gt; 0:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;for header, sequence in MyFastaInstance:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;# Do something with header and sequence&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;print(header, sequence)&amp;lt;/div&amp;gt;&lt;br /&gt;
# Add the methods &#039;&#039;&#039;deletethis()&#039;&#039;&#039;, &#039;&#039;&#039;insertthis(header, sequence)&#039;&#039;&#039;, &#039;&#039;&#039;verifythis(alphabet)&#039;&#039;&#039; and &#039;&#039;&#039;discardthis(alphabet)&#039;&#039;&#039; to the &#039;&#039;&#039;Fasta&#039;&#039;&#039; class. The methods should only work when &#039;&#039;iterating over an instance at the current item&#039;&#039;, i.e. they work when you are iterating over the fasta sequences on the &#039;&#039;current&#039;&#039; sequence and header, like this:&amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;for header, sequence in MyFastaInstance:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;if not MyFastaInstance.verifythis(&amp;quot;DNA&amp;quot;):&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;MyFastaInstance.deletethis()&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;continue&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;# Do something with header and sequence&amp;lt;/div&amp;gt;As some may remember, it is normally impossible to successfully iterate straightforward through a list and delete and/or add elements to the list during the iteration. You have to make this possible, maybe by changing the way your iteration works in the previous exercise.&lt;br /&gt;
&lt;br /&gt;
== Exercises for extra practice ==&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Classes&amp;diff=115</id>
		<title>Classes</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Classes&amp;diff=115"/>
		<updated>2025-03-17T17:15:59Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Exercises to be handed in */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
{| width=500  style=&amp;quot;font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;&amp;quot;&lt;br /&gt;
|Previous: [[Comprehension, Generators, Functions and Methods]]&lt;br /&gt;
|Next: [[Unit test]]&lt;br /&gt;
|}&lt;br /&gt;
== Required course material for the lesson ==&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22113/22113_07-Classes.ppt Classes]&amp;lt;br&amp;gt;&lt;br /&gt;
Resource: [[Example code - Classes]]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Subjects covered ==&lt;br /&gt;
Classes&amp;lt;br&amp;gt;&lt;br /&gt;
Object Oriented Programming&lt;br /&gt;
&lt;br /&gt;
== Exercises to be handed in ==&lt;br /&gt;
The exercises will be about the making of a class. A the exercises goes on the class can do more and more. Remember to add code outside the class, which test the new methods you make as you make them. The class methods need some decent input control and you need to think about what is required. Notice how versatile the class becomes, as you progress through the exercises.&lt;br /&gt;
# Create a class called &#039;&#039;&#039;Fasta&#039;&#039;&#039;. The purpose of the class is to do various kinds of manipulation of fasta files and sequences. Start with adding 3 class methods:&amp;lt;br&amp;gt;&#039;&#039;&#039;load(filename)&#039;&#039;&#039;, which gets a file name/path and loads the fasta file into internal instance lists with header and sequence.&amp;lt;br&amp;gt;&#039;&#039;&#039;save(filename)&#039;&#039;&#039;, which writes the internal instance header/sequence lists into a fasta file.&amp;lt;br&amp;gt;&#039;&#039;&#039;content()&#039;&#039;&#039;, which returns two lists; the headers and the sequences.&amp;lt;br&amp;gt;You can likely reuse part of your functions in exercise 3 and 4 from [[Making Functions]].&amp;lt;br&amp;gt;Example use of the Fasta class:&amp;lt;br&amp;gt;&amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;myfasta = Fasta()&amp;lt;br&amp;gt;myfasta.load(&amp;quot;dna7.fsa&amp;quot;)&amp;lt;br&amp;gt;print(myfasta.content())&amp;lt;br&amp;gt;myfasta.save(&amp;quot;newfile.fsa&amp;quot;)&amp;lt;/div&amp;gt;&lt;br /&gt;
# Add a method &#039;&#039;&#039;delete([start[,end]])&#039;&#039;&#039;, which deletes entries in the headers and sequences. If called with no arguments like &#039;&#039;&#039;delete()&#039;&#039;&#039; it deletes all headers/sequences. If called with one argument like &#039;&#039;&#039;delete(start)&#039;&#039;&#039; it deletes the header and sequence at position &#039;&#039;start&#039;&#039;. If called with two arguments, it deletes headers and sequences from position &#039;&#039;start&#039;&#039; up to but not including position &#039;&#039;end&#039;&#039;. &#039;&#039;start&#039;&#039; and &#039;&#039;end&#039;&#039; can also be negative numbers, and in such case we count from the end of the lists, just like it works in normal list manipulations.&lt;br /&gt;
# Modify the &#039;&#039;&#039;content()&#039;&#039;&#039; method, so it works similar to &#039;&#039;&#039;delete()&#039;&#039;&#039;. If called with no arguments like &#039;&#039;&#039;content()&#039;&#039;&#039; it returns 2 lists with all headers and sequences. If called with one argument like &#039;&#039;&#039;content(start)&#039;&#039;&#039; it returns the header and sequence at position &#039;&#039;start&#039;&#039;. If called with two arguments, it returns the headers and sequences from position &#039;&#039;start&#039;&#039; up to but not including position &#039;&#039;end&#039;&#039; as two lists. Be careful that you return a copy of the lists and not the lists themselves as then the headers and sequences in the instance can be modified unintended outside the instance.&lt;br /&gt;
# Add a method &#039;&#039;&#039;insert(header,sequence[,position])&#039;&#039;&#039;, which adds header and sequence to the instance lists. &#039;&#039;header&#039;&#039; and &#039;&#039;sequence&#039;&#039; can either be simple strings (single header and sequence) or lists of headers and sequences. If &#039;&#039;position&#039;&#039; is not given, then the addition takes place at the end of the existing headers/sequences. If &#039;&#039;position&#039;&#039; is given then insertion takes places at that position.&lt;br /&gt;
# Add a method &#039;&#039;&#039;verify(alphabet,[start[,end]])&#039;&#039;&#039;, which verifies sequence entries according to an alphabet. It works in a way similar to &#039;&#039;&#039;delete()&#039;&#039;&#039; and &#039;&#039;&#039;content()&#039;&#039;&#039; for the range of what should be verified. The method returns True if all entries in the range are verified, False otherwise.&amp;lt;br&amp;gt;The alphabet could be a DNA alphabet (ATCG), a protein alphabet or something derived from those. You should put your alphabets into class variables, as they are common for all instances, and not subject to change.&lt;br /&gt;
# Add a method &#039;&#039;&#039;discard(alphabet,[start[,end]])&#039;&#039;&#039;, which discards sequence entries and corresponding headers if they do not match the given alphabet. It works in a way similar to &#039;&#039;&#039;delete()&#039;&#039;&#039; and &#039;&#039;&#039;content()&#039;&#039;&#039; for the range of what should be verified.&lt;br /&gt;
# Add iteration and length evaluation to the &#039;&#039;&#039;Fasta&#039;&#039;&#039; class using magic methods. Do this so you can write code like &amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;if len(MyFastaInstance) &amp;gt; 0:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;for header, sequence in MyFastaInstance:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;# Do something with header and sequence&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;print(header, sequence)&amp;lt;/div&amp;gt;&lt;br /&gt;
# Add the methods &#039;&#039;&#039;deletethis()&#039;&#039;&#039;, &#039;&#039;&#039;insertthis(header, sequence)&#039;&#039;&#039;, &#039;&#039;&#039;verifythis(alphabet)&#039;&#039;&#039; and &#039;&#039;&#039;discardthis(alphabet)&#039;&#039;&#039; to the &#039;&#039;&#039;Fasta&#039;&#039;&#039; class. The methods should only work when &#039;&#039;iterating over an instance at the current item&#039;&#039;, i.e. they work when you are iterating over the fasta sequences on the &#039;&#039;current&#039;&#039; sequence and header, like this:&amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;for header, sequence in MyFastaInstance:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;if not MyFastaInstance.verifythis(&amp;quot;DNA&amp;quot;):&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;MyFastaInstance.deletethis()&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;continue&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;# Do something with header and sequence&amp;lt;/div&amp;gt;As some may remember, it is normally impossible to successfully iterate straightforward through a list and delete and/or add elements to the list during the iteration. You have to make this possible, maybe by changing the way your iteration works in the previous exercise.&lt;br /&gt;
&lt;br /&gt;
== Exercises for extra practice ==&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Classes&amp;diff=114</id>
		<title>Classes</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Classes&amp;diff=114"/>
		<updated>2025-03-17T17:13:18Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Exercises to be handed in */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
{| width=500  style=&amp;quot;font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;&amp;quot;&lt;br /&gt;
|Previous: [[Comprehension, Generators, Functions and Methods]]&lt;br /&gt;
|Next: [[Unit test]]&lt;br /&gt;
|}&lt;br /&gt;
== Required course material for the lesson ==&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22113/22113_07-Classes.ppt Classes]&amp;lt;br&amp;gt;&lt;br /&gt;
Resource: [[Example code - Classes]]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Subjects covered ==&lt;br /&gt;
Classes&amp;lt;br&amp;gt;&lt;br /&gt;
Object Oriented Programming&lt;br /&gt;
&lt;br /&gt;
== Exercises to be handed in ==&lt;br /&gt;
The exercises will be about the making of a class. A the exercises goes on the class can do more and more. Remember to add code outside the class, which test the new methods you make as you make them. The class methods need some decent input control and you need to think about what is required. Notice how versatile the class becomes, as you progress through the exercises.&lt;br /&gt;
# Create a class called &#039;&#039;&#039;Fasta&#039;&#039;&#039;. The purpose of the class is to do various kinds of manipulation of fasta files and sequences. Start with adding 3 class methods:&amp;lt;br&amp;gt;&#039;&#039;&#039;load(filename)&#039;&#039;&#039;, which gets a file name/path and loads the fasta file into internal instance lists with header and sequence.&amp;lt;br&amp;gt;&#039;&#039;&#039;save(filename)&#039;&#039;&#039;, which writes the internal instance header/sequence lists into a fasta file.&amp;lt;br&amp;gt;&#039;&#039;&#039;content()&#039;&#039;&#039;, which returns two lists; the headers and the sequences.&amp;lt;br&amp;gt;You can likely reuse part of your functions in exercise 3 and 4 from [[Making Functions]].&amp;lt;br&amp;gt;Example use of the Fasta class:&amp;lt;br&amp;gt;&amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;myfasta = Fasta()&amp;lt;br&amp;gt;myfasta.load(&amp;quot;dna7.fsa&amp;quot;)&amp;lt;br&amp;gt;print(myfasta.content())&amp;lt;br&amp;gt;myfasta.save(&amp;quot;newfile.fsa&amp;quot;)&amp;lt;/div&amp;gt;&lt;br /&gt;
# Add a method &#039;&#039;&#039;delete([start[,end]])&#039;&#039;&#039;, which deletes entries in the headers and sequences. If called with no arguments like &#039;&#039;&#039;delete()&#039;&#039;&#039; it deletes all headers/sequences. If called with one argument like &#039;&#039;&#039;delete(start)&#039;&#039;&#039; it deletes the header and sequence at position &#039;&#039;start&#039;&#039;. If called with two arguments, it deletes headers and sequences from position &#039;&#039;start&#039;&#039; up to but not including position &#039;&#039;end&#039;&#039;. &#039;&#039;start&#039;&#039; and &#039;&#039;end&#039;&#039; can also be negative numbers, and in such case we count from the end of the lists, just like it works in normal list manipulations.&lt;br /&gt;
# Modify the &#039;&#039;&#039;content()&#039;&#039;&#039; method, so it works similar to &#039;&#039;&#039;delete()&#039;&#039;&#039;. If called with no arguments like &#039;&#039;&#039;content()&#039;&#039;&#039; it returns 2 lists with all headers and sequences. If called with one argument like &#039;&#039;&#039;content(start)&#039;&#039;&#039; it returns the header and sequence at position &#039;&#039;start&#039;&#039;. If called with two arguments, it returns the headers and sequences from position &#039;&#039;start&#039;&#039; up to but not including position &#039;&#039;end&#039;&#039; as two lists. Be careful that you return a copy of the lists and not the lists themselves as then the headers and sequences in the instance can be modified unintended outside the instance.&lt;br /&gt;
# Add a method &#039;&#039;&#039;insert(header,sequence[,position])&#039;&#039;&#039;, which adds header and sequence to the instance lists. &#039;&#039;header&#039;&#039; and &#039;&#039;sequence&#039;&#039; can either be simple strings (single header and sequence) or lists of headers and sequences. If &#039;&#039;position&#039;&#039; is not given, then the addition takes place at the end of the existing headers/sequences. If &#039;&#039;position&#039;&#039; is given then insertion takes places at that position.&lt;br /&gt;
# Add a method &#039;&#039;&#039;verify(alphabet,[start[,end]])&#039;&#039;&#039;, which verifies sequence entries according to an alphabet. It works in a way similar to &#039;&#039;&#039;delete()&#039;&#039;&#039; and &#039;&#039;&#039;content()&#039;&#039;&#039; for the range of what should be verified. The method returns True if all entries in the range are verified, False otherwise.&amp;lt;br&amp;gt;The alphabet could be a DNA alphabet (ATCG), a protein alphabet or something derived from those. You should put your alphabets into class variables, as they are common for all instances, and not subject to change.&lt;br /&gt;
# Add a method &#039;&#039;&#039;discard(alphabet,[start[,end]])&#039;&#039;&#039;, which discards sequence entries and corresponding headers if they do not match the given alphabet. It works in a way similar to &#039;&#039;&#039;delete()&#039;&#039;&#039; and &#039;&#039;&#039;content()&#039;&#039;&#039; for the range of what should be verified.&lt;br /&gt;
# Add iteration and length evaluation to the &#039;&#039;&#039;Fasta&#039;&#039;&#039; class using magic methods. Do this so you can write code like &amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;if len(MyFastaInstance) &amp;gt; 0:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;for header, sequence in MyFastaInstance:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;# Do something with header and sequence&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;print(header, sequence)&amp;lt;/div&amp;gt;&lt;br /&gt;
# Add the methods &#039;&#039;&#039;deletethis()&#039;&#039;&#039;, &#039;&#039;&#039;insertthis(header, sequence)&#039;&#039;&#039;, &#039;&#039;&#039;verifythis(alphabet)&#039;&#039;&#039; and &#039;&#039;&#039;discardthis(alphabet)&#039;&#039;&#039; to the &#039;&#039;&#039;Fasta&#039;&#039;&#039; class. The methods should only work when &#039;&#039;iterating over an instance at the current item&#039;&#039;, i.e. they work when you are iterating over the fasta sequences on the &#039;&#039;current&#039;&#039; sequence and header, like this:&amp;lt;div style=&amp;quot;font-family:&#039;Courier New&#039;&amp;quot;&amp;gt;for header, sequence in MyFastaInstance:&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;if not MyFastaInstance.verifythis(&amp;quot;DNA&amp;quot;):&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;MyFastaInstance.deletethis()&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;continue&amp;lt;br&amp;gt;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;&amp;amp;nbsp;# Do something with sequence and header&amp;lt;/div&amp;gt;As some may remember, it is normally impossible to successfully iterate straightforward through a list and delete and/or add elements to the list during the iteration. You have to make this possible, maybe by changing the way your iteration works in the previous exercise.&lt;br /&gt;
&lt;br /&gt;
== Exercises for extra practice ==&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Regular_Expressions&amp;diff=113</id>
		<title>Regular Expressions</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Regular_Expressions&amp;diff=113"/>
		<updated>2025-03-17T17:06:11Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Exercises to be handed in */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
{| width=500  style=&amp;quot;font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;&amp;quot;&lt;br /&gt;
|Previous: [[Python Recap and Objects]]&lt;br /&gt;
|Next: [[Making Functions]]&lt;br /&gt;
|}&lt;br /&gt;
== Required course material for the lesson ==&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22113/22113_03-Regex.ppt Regular expressions in Python]&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=04cb2b80-d941-42a5-a632-af27012cd0d7 Regular Expressions] Monday&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=738d525b-a6c1-4973-b6cc-af27012ca86e An (unfortunately) true story] Monday&amp;lt;br&amp;gt;&lt;br /&gt;
Resource: [[Example code - Regex]]&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=2952d382-7059-4691-8170-af27012bc9a1 Live Coding]&amp;lt;br&amp;gt;&lt;br /&gt;
PDF: [https://teaching.healthtech.dtu.dk/material/22113/regular-expressions-cheat-sheet-v2.pdf Regular Expressions Cheat Sheet]&amp;lt;br&amp;gt;&lt;br /&gt;
WWW: [http://regex101.com/ Web page where you can test your regular expressions]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Subjects covered ==&lt;br /&gt;
* Regular expressions, duh.&lt;br /&gt;
* Patterns, how to design and use them.&lt;br /&gt;
&lt;br /&gt;
== Exercises to be handed in ==&lt;br /&gt;
&#039;&#039;&#039;You might recognize some of these exercises. You must ONLY use regex for your pattern recognition and extraction of single data points (like an accession number).&#039;&#039;&#039;&amp;lt;br&amp;gt;&lt;br /&gt;
If you don&#039;t know what &#039;&#039;&#039;stateful parsing&#039;&#039;&#039; is, [https://teaching.healthtech.dtu.dk/22101/index.php/Stateful_Parsing look here]&lt;br /&gt;
# Make a program that accepts a string as input from the keyboard. Use regular expressions (RE) to determine if the input is a number. The goal is to do this with a SINGLE regex.&amp;lt;br&amp;gt; These should all be considered as numbers: &amp;quot;4&amp;quot;   &amp;quot;-7&amp;quot;   &amp;quot;0.656&amp;quot;   &amp;quot;-67.35555&amp;quot;&amp;lt;br&amp;gt; These are not numbers: &amp;quot;5.&amp;quot;  &amp;quot;56F&amp;quot;  &amp;quot;.32&amp;quot;  &amp;quot;-.04&amp;quot;  &amp;quot;1+1&amp;quot;&amp;lt;br&amp;gt; Note: The program is very simple, but it is likely the most difficult regular expression, you will have to make in this set of exercises. Perhaps you should do the following exercises before attempting this one - just to get some experience first.&lt;br /&gt;
# Make a program that can read and verify a fasta file. Test with &#039;&#039;dna7.fsa&#039;&#039; and &#039;&#039;dnanoise.fsa&#039;&#039;. Verification here means that the program prints &amp;quot;DNA fasta&amp;quot; or &amp;quot;Protein fasta&amp;quot; if the file is successfully verified for either dna or protein sequence, and &amp;quot;Not fasta&amp;quot; if unsuccessfully verified. You can find a description of fasta format in [[Biological knowledge needed in the course]]. You are expected to know which symbols are used for DNA and protein sequence - or that you are able to look it up. Hint: If you have made a program before (previous course) that reads a fasta file, this and the following exercise is not too hard, but otherwise you can consider doing them last.&lt;br /&gt;
# Change exercise 2 in the following way: Make the program discard entries that can not conform to DNA or protein sequence, and rewrite the acceptable entries in the output file &#039;&#039;fastaout.fsa&#039;&#039;, in such a way that the normal 60 chars per line is followed with no spaces in between. The program must inform the user how many entries was kept and how many discarded. Test on &#039;&#039;dnanoise.fsa&#039;&#039;, which contain 3 entries that should be discarded - this is a strong hint.&lt;br /&gt;
# The last exercises will all have to do with the files &#039;&#039;data1-4.gb&#039;&#039;, which are various Genbank entries of genes. First you should study the files, notice the structure of the data. In all exercises you will have to parse (read and find the wanted data) the files using RE&#039;s which are very well designed for that purpose. This is a build-up process, so every exercise is added to the previous ones, so the final program can do a lot. Your program should be able to handle all files (so test them), but just one at a time.&lt;br /&gt;
# Extract the accession number, the definition and the organism (and print it).&lt;br /&gt;
# Extract and print all MEDLINE article numbers which are mentioned in the entries.&lt;br /&gt;
# Extract and print the translated gene (the amino acid sequence). Look for the line starting with /translation=. Generalize; An amino acid sequence can be short, i.e. only one line in the feature table, or long, i.e. more than one line in the feature table. Use stateful parsing.&lt;br /&gt;
# Extract and print the DNA (whole base sequence in the end of the file). Use stateful parsing.&lt;br /&gt;
# &amp;lt;font color=&amp;quot;#AA00FF&amp;quot;&amp;gt;Extract and print ONLY the coding DNA. That is described in FEATURES - CDS (Coding DNA Sequence). As an example, the line in &#039;&#039;data1.gb&#039;&#039; says &#039;join(2424..2610,3397..3542)&#039; and means that the coding sequence are bases 2424-2610 followed by bases 3397-3542. The bases in between are an intron and not a part of the coding DNA. Remember to generalize; there can be more (or less) than two exons, and the &#039;join&#039; line can continue on the next line. Use stateful parsing.&amp;lt;/font&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Exercises for extra practice ==&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Comprehension,_Generators,_Functions_and_Methods&amp;diff=112</id>
		<title>Comprehension, Generators, Functions and Methods</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Comprehension,_Generators,_Functions_and_Methods&amp;diff=112"/>
		<updated>2025-03-16T06:12:51Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Exercises to be handed in */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
{| width=500  style=&amp;quot;font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;&amp;quot;&lt;br /&gt;
|Previous: [[Advanced Data Structures and New Data Types]]&lt;br /&gt;
|Next: [[Classes]]&lt;br /&gt;
|}&lt;br /&gt;
== Required course material for the lesson ==&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22113/22113_06-Comprehension.ppt Comprehension, Generators, Functions and Methods]&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=ef476c74-cc82-4478-afd2-af270128c92f Comprehension] Monday&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=ad2b345c-4c9f-4fbb-8593-af270128ae40 Generators] Monday&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=ff23a72f-a1ee-46fc-a434-af2701281405 Iteration in detail, use of lambda function, libraries] Monday&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=83c5ef87-3ed8-4069-b691-af130047ec9f How to parse bio files with many entries]&amp;lt;br&amp;gt;&lt;br /&gt;
Resource: [[Example code - Comprehension]]&amp;lt;br&amp;gt;&lt;br /&gt;
Resource: [[Example code - Misc]]&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Video: [https://video.dtu.dk/media/22110-lesson11-LiveCoding/0_0l5ifl0e Live Coding 1]&amp;lt;br&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=56f5c2c8-a3fe-4847-bb94-af2701286138 Live Coding 2]&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Subjects covered ==&lt;br /&gt;
&#039;&#039;Comprehension&#039;&#039;, which is a way of manipulation/selecting data with a hidden loop.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;Lambda&#039;&#039;, the small anonymous function.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;Generators&#039;&#039;, which is like a function with memory of previous calls.&amp;lt;br&amp;gt;&lt;br /&gt;
More theoretical iteration.&amp;lt;br&amp;gt;&lt;br /&gt;
New functions and methods.&lt;br /&gt;
&lt;br /&gt;
== Exercises to be handed in ==&lt;br /&gt;
# &amp;lt;font color=&amp;quot;#AA00FF&amp;quot;&amp;gt;Make a program that calculates the product of two matrices and prints it on the screen (which is STDOUT, remember unix). The matrices are in the files &#039;&#039;mat1.dat&#039;&#039; and &#039;&#039;mat2.dat&#039;&#039;. Numbers in the files are tab separated. A matrix should be stored as an list of lists.&amp;lt;br&amp;gt;Advice: The program should have a function that reads a matrix from a given file (to be used twice), a function that calculates the product, and a function that prints a matrix. This way ensures that your program is easy to change to other forms of matrix calculations. Here are two links to the definition of matrix multiplication.&amp;lt;br&amp;gt;[https://www.mathsisfun.com/algebra/matrix-multiplying.html Math is Fun]&amp;lt;br&amp;gt;[http://mathworld.wolfram.com/MatrixMultiplication.html Math world]&amp;lt;/font&amp;gt;&lt;br /&gt;
# The purpose of this exercise is to find the 10 genes that has the biggest difference in expression between cancer and control patients in the &#039;&#039;dna-array.dat&#039;&#039; file after a linear transformation of the numbers in the columns. In order to not start from the beginning, use the file &#039;&#039;dna-array-norm.dat&#039;&#039; created in exercise 4 in [[Advanced Data Structures and New Data Types]] as input. The other tab-separated input file &#039;&#039;lineartransform.dat&#039;&#039; has an &#039;&#039;&#039;A&#039;&#039;&#039; (slope) and a &#039;&#039;&#039;B&#039;&#039;&#039; (intersection) - one AB pair for each number column in the &#039;&#039;dna-array-norm.dat&#039;&#039; file. For each line in &#039;&#039;dna-array-norm.dat&#039;&#039; you first linear transform the numbers according to the A &amp;amp; B in &#039;&#039;lineartransform.dat&#039;&#039; - first number uses first AB pair, second number uses second AB pair, and so forth. If your number is X, then the transformed number is A*X+B. When the entire line is transformed, you calculate the average of the cancer patients and the average of the controls. From that, find the 10 genes with the biggest difference in expression. There are a number of ways, but a simple one is to create a list of tuples with every tuple consisting of (gene name, cancer average, control average), and then sort the list according to the difference in cancer and control average. Using a lambda function when sorting springs to mind. Display the top 10 in the sorted list.&lt;br /&gt;
# Make a moving average generator: &#039;&#039;&#039;moving_avg(List_of_numbers, Window_size)&#039;&#039;&#039;. The generator calculates the average number in a window moving across the list. Try it on the numbers in &#039;&#039;ex1.dat&#039;&#039;, i.e. load the numbers column-based into a single list first, i.e. first all the numbers in column 1, then the numbers in column 2, and so forth in the list.&lt;br /&gt;
# Make a trend discoverer generator: &#039;&#039;&#039;trend(List_of_numbers)&#039;&#039;&#039;. It looks at a list of numbers in a moving window way and emits 1, if the next number is higher than the previous, and 0 otherwise. Any longer sequence of 0&#039;s or 1&#039;s in the generator output is a trend in the data. Check with &#039;&#039;ex1.dat&#039;&#039; (load same way a previous exercise) or another file of your choosing.&lt;br /&gt;
# Changing the previous exercise: Make a &#039;&#039;&#039;find_trend(List_of_numbers, Minimum_trend_size)&#039;&#039;&#039; generator, which return a tuple &#039;&#039;&#039;(Position_Start, Size, Direction)&#039;&#039;&#039; of where and how big the trends in List_of_numbers are. &#039;&#039;&#039;Direction&#039;&#039;&#039; is 0 or 1 as you want to know which direction the trend is going. &#039;&#039;&#039;Position_start&#039;&#039;&#039; is the position in the (zero-based) list, where the trend starts. &#039;&#039;&#039;Size&#039;&#039;&#039; is how long the trend of ascending/descending numbers is. This is surprisingly difficult. Test with a simple file of your own making to check your results.&lt;br /&gt;
# Make a generator &#039;&#039;&#039;combinations()&#039;&#039;&#039;, that takes a list of strings as input, e.g. &#039;&#039;&#039;combinations([&amp;quot;GAVIL&amp;quot;, &amp;quot;ST&amp;quot;, &amp;quot;NQ&amp;quot;, &amp;quot;FWY&amp;quot;, &amp;quot;D&amp;quot;, &amp;quot;HKR&amp;quot;])&#039;&#039;&#039;, and generates all possible combinations. A combination is formed by choosing 1 letter from the first string, 1 letter from the second string, and so forth, in that order, until a letter from all strings is chosen. The input list can have any number of strings and the strings can have any length (greater than 0). There must be NO REPEATS - random is not an acceptable library to use. As is obvious, the example has 5*2*2*3*1*3 = 180 different combinations, the first being GSNFDH. Print them all on the screen. If your input is [&#039;0123456789&#039;, &#039;0123456789&#039;, &#039;0123456789&#039;], then you will print the numbers from 000 to 999. Hint: A list of counters, 1 per string, could be useful in iterating through the combinations.&amp;lt;br&amp;gt;When can such a generator be useful? If you want to generate a list of antigens, which needs certain amino acids to be in certain positions.&lt;br /&gt;
&lt;br /&gt;
== Exercises for extra practice ==&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Python_Recap_and_Objects&amp;diff=111</id>
		<title>Python Recap and Objects</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Python_Recap_and_Objects&amp;diff=111"/>
		<updated>2025-03-16T06:04:50Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Exercises to be handed in */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
{| width=500  style=&amp;quot;font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;&amp;quot;&lt;br /&gt;
|Previous: [[Unix]]&lt;br /&gt;
|Next: [[Regular Expressions]]&lt;br /&gt;
|}&lt;br /&gt;
== Required course material for the lesson ==&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22113/22113_02-RecapObjects.ppt Python Recap and Objects]&amp;lt;br&amp;gt;&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22113/22113_02-Random.ppt Random numbers]&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;!-- Resource: [[Example code - File Reading]]&amp;lt;br&amp;gt; --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Subjects covered ==&lt;br /&gt;
* A short essential recap of Python learned course 22101.&lt;br /&gt;
* F-string formatting&lt;br /&gt;
* Command line arguments with Python&lt;br /&gt;
* The Python Object Model, and how it influences Python.&lt;br /&gt;
* Identity versus Equality&lt;br /&gt;
&lt;br /&gt;
== Exercises to be handed in ==&lt;br /&gt;
&#039;&#039;&#039;In exercises 2-9 the job is to &#039;&#039;select&#039;&#039; the lines in the input file, i.e. the exercises are about which lines to select in various ways. The output is just a few of the lines in the inputfile.&#039;&#039;&#039;&lt;br /&gt;
# Make a handy little calculator, &#039;&#039;&#039;calc.py&#039;&#039;&#039;, which takes 2 numbers and a operation from command line and displays the result: Here is an example of how it works:&amp;lt;pre&amp;gt;./calc.py 3 + 6&amp;lt;/pre&amp;gt;The output should just be the result. You need to have at least the basic 4 operators working (+,-,/,*). If you want to, you can extend it to have more numbers than 2, like&amp;lt;pre&amp;gt;./calc.py 5 + 12 / 4&amp;lt;/pre&amp;gt;An issue with operator precedence might appear. You might have difficulty with *, remember what you were just taught about unix.&lt;br /&gt;
# &amp;lt;font color=&amp;quot;#AA00FF&amp;quot;&amp;gt;The input file &#039;&#039;scores.txt&#039;&#039; is a tab-separated file with an accession number in first column followed by 6 numbers (scores) between 0 and 1. You must find the accession numbers and scores (that means the entire line) of the 10 highest and 10 lowest &amp;quot;combined scores&amp;quot; (combined score is the metric for selection) and save the output in the file &#039;&#039;scoresextreme.txt&#039;&#039;.&amp;lt;br&amp;gt;The combined score is simply the 6 numbers added together. The order of the output must be from high to low. Take the name of input file and the output file from the command line, so the program is flexible.&amp;lt;/font&amp;gt;&lt;br /&gt;
# Change exercise 2 in the following way: There is an input file, &#039;&#039;negative_list.txt&#039;&#039;, which is a list of genes which can NOT be part of the output. They are banned from your analysis. As can be seen, the genes are identified by their swissprot id. In order to translate from swissprot id to accession number so you can relate it to the &#039;&#039;scores.txt&#039;&#039;, you must use the input file &#039;&#039;translation.txt&#039;&#039;, where the first item on the line is a accession number, second item is the corresponding swissprot id.&lt;br /&gt;
# Change exercise 2 in the following way: Make the program work no matter how many numbers there are on every line. It must be the same number of numbers, i.e. in one file it could be 10 numbers on every line, in another file it could be 7 numbers per line. &lt;br /&gt;
# Change exercise 2 in the following way: Instead of using the combined score as the metric for selecting the accession numbers and scores, then use the average score as the metric. That will allow for having a different number of numbers on each line in the input file.&lt;br /&gt;
# Change exercise 2 in the following way: When you calculate the combined score the first number should weigh 50% more than the other numbers and the last should weigh 50% less.&lt;br /&gt;
# Change exercise 2 in the following way: When you calculate the combined score the numbers should be weighted after a linear sliding scale with the first number count for 50% more than its real value, sliding linearly down to the last number which is weighted 50% less. The weight is thus dynamically calculated according to how many numbers there are on the line and which position the number is on the line.&amp;lt;br&amp;gt;&#039;&#039;&#039;N&#039;&#039;&#039; is the number of numbers, &#039;&#039;&#039;P&#039;&#039;&#039; is the position of the number, then the weight &#039;&#039;&#039;W&#039;&#039;&#039; is calculated as: W = 1.5 - (P-1)/(N-1)&amp;lt;br&amp;gt;A more generic expression for the sliding weighting scale - &#039;&#039;&#039;B&#039;&#039;&#039; is the beginning weight, &#039;&#039;&#039;E&#039;&#039;&#039; is the ending weight: W = B - (B-E)*(P-1)/(N-1)&lt;br /&gt;
# Change exercise 2 in the following way: Just find the 10 lines in the input file with the highest combined scores. This should be easier than the original exercise.&lt;br /&gt;
# This is the same exercise as the previous (ex 8), however imagine that the input file is enormous - so big that you can not have it in memory. You still need to solve the problem, and this is done by having a running list of the best scores.&lt;br /&gt;
&lt;br /&gt;
== Exercises for extra practice ==&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=22113/22163_-_Unix_%26_Python_Programming_for_Bioinformaticians&amp;diff=110</id>
		<title>22113/22163 - Unix &amp; Python Programming for Bioinformaticians</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=22113/22163_-_Unix_%26_Python_Programming_for_Bioinformaticians&amp;diff=110"/>
		<updated>2025-01-20T10:43:01Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Prepare for the course */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
== Prepare for the course ==&lt;br /&gt;
You must read and follow the [[Course preparation]] before the you show up on the first day of the course.&amp;lt;br&amp;gt;&lt;br /&gt;
You are &#039;&#039;&#039;required&#039;&#039;&#039; to read at least the first part of [[Aligning expectations]] when the course starts and whenever you have a question related to the conduction of the course.&amp;lt;br&amp;gt;&lt;br /&gt;
Resources can be good to check out during the course, or when you need something more.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Teacher:&#039;&#039;&#039; [https://www.inside.dtu.dk/da/dtuinside/generelt/telefonbog/person?id=816&amp;amp;cpid=214027&amp;amp;tab=2&amp;amp;qt=dtupublicationquery Peter Wad Sackett], pwsa@dtu.dk&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Language:&#039;&#039;&#039; The course is taught in English.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tools:&#039;&#039;&#039; There is [[Course preparation]].&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Textbooks:&#039;&#039;&#039; There are no text books for the course. I will make do with powerpoints and references to online resources. You can find the material under the individual lessons in the [[programme]].&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Location:&#039;&#039;&#039; Building 116, aud. 83 &amp;lt;!-- span style=&amp;quot;color:red&amp;quot;&amp;gt;NOTICE THIS LOCATION CHANGE&amp;lt;/span --&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; Monday 13:00 - 17:00, Thursday 9:00 - 12:00, module F2-A and F2-B.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Deadlines for project work and exam:&#039;&#039;&#039; See the [[Programme]].&lt;br /&gt;
&lt;br /&gt;
== Course details ==&lt;br /&gt;
There are no plans for streaming the lectures as there already are recorded video lectures for first half of the course. Discord is used for online help and discussion - if necessary.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
| [[Programme]] || Spring 2024&lt;br /&gt;
|-&lt;br /&gt;
| [[Aligning expectations]] || Required reading&lt;br /&gt;
|-&lt;br /&gt;
| [[Code construction]] || Required reading for peer evaluation&lt;br /&gt;
|-&lt;br /&gt;
| [[Project list]] || Of projects to do&lt;br /&gt;
|-&lt;br /&gt;
| [[Mini projects]] || For practicing programming&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
[https://docs.google.com/spreadsheets/d/1wEs2xS-7DmpMvtosweTzYw0Fx3_XCbz4k5cuN5tJVK8/edit?usp=sharing Put yourself on the Get Help list]&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&#039;&#039;&#039;Unix/Linux&#039;&#039;&#039;&lt;br /&gt;
* Youtube: [https://www.youtube.com/watch?v=HbgzrKJvDRw Linux File System/Structure Explained]&lt;br /&gt;
* Youtube: [https://www.youtube.com/watch?v=wBp0Rb-ZJak The Complete Linux Course: Beginner to Power User]&lt;br /&gt;
* Youtube: [https://www.youtube.com/playlist?list=PLIhvC56v63IJIujb5cyE13oLuyORZpdkL Linux series] by the very entertaining NetworkChuck&lt;br /&gt;
* Online: [http://www.oliverelliott.org/article/computing/tut_unix/ Online tutorial on unix]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Python 3&#039;&#039;&#039;&lt;br /&gt;
* Online: [https://www.coursera.org/learn/python Coursera course: Programming for Everybody] is a beginner course in Python. Everyone who wants to prepare more for course 22113 can start here. The [https://teaching.healthtech.dtu.dk/material/22113/CourseraPythonBook_270.pdf Coursera textbook]&lt;br /&gt;
* Online: [https://pynative.com/ PYnative] Good site for learning about Python. Information, tutorials, exercises and even online editor, all well explained in an accessible way.&lt;br /&gt;
* Youtube: [https://www.youtube.com/watch?v=rfscVS0vtbw Python beginner course]&lt;br /&gt;
* Online: [https://teaching.healthtech.dtu.dk/material/22113/clean_code.html Clean Code] by Lukasz Dynowski. An amazing read that is mandatory. Read it once around lesson 3 and once more around lesson 6.&lt;br /&gt;
* Online: [https://rosalind.info/problems/locations/ Rosalind project] Python exercises at different levels for practicing &lt;br /&gt;
* Book: &#039;&#039;Learning Python&#039;&#039;, 5th ed. by Mark Lutz (O&#039;Reilly) ISBN: 978-1-449-35573-9. This is the best Python book I have read. It covers all the basics and then some. All from the perspective of being a novice programmer. However, it is a brick; big, heavy and unwieldy. If you only want one Python book, then this should be the one. The course will not be taught from this book, but it could be good to have as a Python reference manual.&lt;br /&gt;
* Book: &#039;&#039;Python Crash Course: A Hands-On, Project-Based Introduction to Programming&#039;&#039; by Eric Matthes (No Starch Press) ISBN: 1593276036, 9781593276034. A pretty OK book which leads you into the Python world without too many distracting points and theoretical contemplation.&lt;br /&gt;
* Online: [https://docs.python.org/3/tutorial/ Official Python 3 tutorial]&lt;br /&gt;
* Online: [https://docs.python.org/3/reference/index.html Python 3 reference manual]&lt;br /&gt;
* Online: [https://docs.python.org/3/library/index.html Python 3 standard library]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Biological&#039;&#039;&#039;&lt;br /&gt;
* Info: [[Biological knowledge needed in the course]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Writing reports, articles, thesis at university level&#039;&#039;&#039;&lt;br /&gt;
* PDF: [https://teaching.healthtech.dtu.dk/material/22113/Vejledning_i_opbygning_og_skrivning_af_rapporter_v.2013.3.pdf Vejledning i opbygning og skrivning af rapporter]&lt;br /&gt;
* PDF: [https://teaching.healthtech.dtu.dk/material/22113/how_to_write_reports.pdf Guide on how to write a report]&lt;br /&gt;
* Online: [https://www.elsevier.com/connect/11-steps-to-structuring-a-science-paper-editors-will-take-seriously How to structure a science paper]&lt;br /&gt;
* PDF: [https://www.imm.dtu.dk/~janba/MastersThesisAdvice.pdf Master Thesis Advice]&lt;br /&gt;
* Online: [https://thereader.mitpress.mit.edu/umberto-eco-how-to-write-a-thesis/ How to write a thesis]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Interesting but less teaching oriented material&#039;&#039;&#039;&lt;br /&gt;
* Blog: [https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/ Why rewriting software projects can be bad]&lt;br /&gt;
* Online: [http://ivory.idyll.org/blog/big-data-biology.html Top 12 reasons you know you are a Big Data biologist]&lt;br /&gt;
* Online: [http://lifehacker.com/six-life-lessons-ive-learned-from-programming-1502077380 How programming and your life is similar]&lt;br /&gt;
* Youtube: [http://www.youtube.com/watch?v=nKIu9yen5nc What most schools don&#039;t teach - how to think]&lt;br /&gt;
&lt;br /&gt;
== Archive of old course programmes ==&lt;br /&gt;
[[Programme - Spring 2023]]&amp;lt;br&amp;gt;&lt;br /&gt;
[[Programme - Spring 2024]]&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Programme&amp;diff=109</id>
		<title>Programme</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Programme&amp;diff=109"/>
		<updated>2025-01-17T15:37:29Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;[[Collection of files]]&#039;&#039;&#039; used in the exercises and lessons - all gathered here.&lt;br /&gt;
&lt;br /&gt;
* M 03/02 Lesson 1: [[Unix]]&lt;br /&gt;
* T 06/02 More lecture on [[Unix]]&lt;br /&gt;
* M 10/02 Lesson 2: [[Python Recap and Objects]]&lt;br /&gt;
* T 13/02 Official talk about Random numbers&lt;br /&gt;
* M 17/02 Lesson 3: [[Regular Expressions]]&lt;br /&gt;
* T 20/02 Continuing lesson&lt;br /&gt;
* M 24/02 Lesson 4: [[Making Functions]]&lt;br /&gt;
* T 27/02 Unofficial talk about Garbage Collection in Python&lt;br /&gt;
* M 03/03 Lesson 5: [[Advanced Data Structures and New Data Types]]&lt;br /&gt;
* T 06/03 Unofficial talk about how Data Structures work in Python. [https://evaluering.dtu.dk/ Midterm evaluation] - part 1 (this is an evaluation of the course, not of you)&lt;br /&gt;
* M 10/03 Lesson 6: [[Comprehension, Generators, Functions and Methods]]&lt;br /&gt;
* T 13/03 [https://evaluering.dtu.dk/ Midterm evaluation] - part 2&lt;br /&gt;
* M 17/03 Lesson 7: [[Classes]]&lt;br /&gt;
* T 20/03 More lecture on [[Classes]], Project introduction&lt;br /&gt;
* M 24/03 Lesson 8: [[Unit test]] and start of project&lt;br /&gt;
* T 27/03 More lecture on [[Unit test]], project work&lt;br /&gt;
* M 31/03 Lesson 9: [[Scientific Libraries, Pandas, Numpy]]&lt;br /&gt;
* T 03/04 More lecture on [[Scientific Libraries, Pandas, Numpy]], project work&lt;br /&gt;
* M 07/04 Lesson 11: [[Scientific Libraries, Statistics]] SciPy&lt;br /&gt;
* T 10/04 Continuing lesson, project work&lt;br /&gt;
* Easter holidays - &amp;lt;span style=&amp;quot;color: red;&amp;quot;&amp;gt;Easter messes with the lesson order. Notice the switch between 10 &amp;amp; 11&amp;lt;/span&amp;gt;&lt;br /&gt;
* T 24/04 Lesson 10: [[Runtime evaluation of algorithms]]&lt;br /&gt;
* M 28/04 Lesson 12: [[Scientific Libraries, Plotting]] Matplotlib, Seaborn&lt;br /&gt;
* T 01/05 Continuing lesson, project work&lt;br /&gt;
* M 05/05 Lesson 13, [[Last words]], Biopython&lt;br /&gt;
* T 08/05 Q/A session, project work&lt;br /&gt;
* M 12/05 The project is handed in at 15.00 on DTU Learn&lt;br /&gt;
* T 13/05 Start of project evaluation&lt;br /&gt;
* M 20/05 The project evaluation is handed in at 15.00 on DTU Learn&lt;br /&gt;
* T 16/05 Exam - [http://eksamensplan.dtu.dk/ the official DTU exam plan]&lt;br /&gt;
&lt;br /&gt;
== Curious about the exam ==&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/22113/index.php/Aligning_expectations#Passing_the_course Read this], and if you fail or are afraid of failing [https://teaching.healthtech.dtu.dk/22113/index.php/Aligning_expectations#Failing_the_course read here].&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Programme&amp;diff=108</id>
		<title>Programme</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Programme&amp;diff=108"/>
		<updated>2025-01-17T15:35:35Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;[[Collection of files]]&#039;&#039;&#039; used in the exercises and lessons - all gathered here.&lt;br /&gt;
&lt;br /&gt;
* M 03/02 Lesson 1: [[Unix]]&lt;br /&gt;
* T 06/02 More lecture on [[Unix]]&lt;br /&gt;
* M 10/02 Lesson 2: [[Python Recap and Objects]]&lt;br /&gt;
* T 13/02 Official talk about Random numbers&lt;br /&gt;
* M 17/02 Lesson 3: [[Regular Expressions]]&lt;br /&gt;
* T 20/02 Continuing lesson&lt;br /&gt;
* M 24/02 Lesson 4: [[Making Functions]]&lt;br /&gt;
* T 27/02 Unofficial talk about Garbage Collection in Python&lt;br /&gt;
* M 03/03 Lesson 5: [[Advanced Data Structures and New Data Types]]&lt;br /&gt;
* T 06/03 Unofficial talk about how Data Structures work in Python. [https://evaluering.dtu.dk/ Midterm evaluation] - part 1 (this is an evaluation of the course, not of you)&lt;br /&gt;
* M 10/03 Lesson 6: [[Comprehension, Generators, Functions and Methods]]&lt;br /&gt;
* T 13/03 [https://evaluering.dtu.dk/ Midterm evaluation] - part 2&lt;br /&gt;
* M 17/03 Lesson 7: [[Classes]]&lt;br /&gt;
* T 20/03 More lecture on [[Classes]], Project introduction&lt;br /&gt;
* M 24/03 Lesson 8: [[Unit test]] and start of project&lt;br /&gt;
* T 27/03 More lecture on [[Unit test]], project work&lt;br /&gt;
* M 31/03 Lesson 9: [[Scientific Libraries, Pandas, Numpy]]&lt;br /&gt;
* T 03/04 More lecture on [[Scientific Libraries, Pandas, Numpy]], project work&lt;br /&gt;
* M 07/04 Lesson 11: [[Scientific Libraries, Statistics]] SciPy&lt;br /&gt;
* T 10/04 Continuing lesson, project work&lt;br /&gt;
* Easter holidays - &amp;lt;span style=&amp;quot;color: red;&amp;quot;&amp;gt;Easter messes with the lesson order&amp;lt;/span&amp;gt;&lt;br /&gt;
* T 24/04 Lesson 10: [[Runtime evaluation of algorithms]]&lt;br /&gt;
* M 28/04 Lesson 12: [[Scientific Libraries, Plotting]] Matplotlib, Seaborn&lt;br /&gt;
* T 01/05 Continuing lesson, project work&lt;br /&gt;
* M 05/05 Lesson 13, [[Last words]], Biopython&lt;br /&gt;
* T 08/05 Q/A session, project work&lt;br /&gt;
* M 12/05 The project is handed in at 15.00 on DTU Learn&lt;br /&gt;
* T 13/05 Start of project evaluation&lt;br /&gt;
* M 20/05 The project evaluation is handed in at 15.00 on DTU Learn&lt;br /&gt;
* T 16/05 Exam - [http://eksamensplan.dtu.dk/ the official DTU exam plan]&lt;br /&gt;
&lt;br /&gt;
== Curious about the exam ==&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/22113/index.php/Aligning_expectations#Passing_the_course Read this], and if you fail or are afraid of failing [https://teaching.healthtech.dtu.dk/22113/index.php/Aligning_expectations#Failing_the_course read here].&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Programme&amp;diff=107</id>
		<title>Programme</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Programme&amp;diff=107"/>
		<updated>2025-01-17T15:34:27Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;[[Collection of files]]&#039;&#039;&#039; used in the exercises and lessons - all gathered here.&lt;br /&gt;
&lt;br /&gt;
* M 03/02 Lesson 1: [[Unix]]&lt;br /&gt;
* T 06/02 More lecture on [[Unix]]&lt;br /&gt;
* M 10/02 Lesson 2: [[Python Recap and Objects]]&lt;br /&gt;
* T 13/02 Official talk about Random numbers&lt;br /&gt;
* M 17/02 Lesson 3: [[Regular Expressions]]&lt;br /&gt;
* T 20/02 Continuing lesson&lt;br /&gt;
* M 24/02 Lesson 4: [[Making Functions]]&lt;br /&gt;
* T 27/02 Unofficial talk about Garbage Collection in Python&lt;br /&gt;
* M 03/03 Lesson 5: [[Advanced Data Structures and New Data Types]]&lt;br /&gt;
* T 06/03 Unofficial talk about how Data Structures work in Python. [https://evaluering.dtu.dk/ Midterm evaluation] - part 1 (this is an evaluation of the course, not of you)&lt;br /&gt;
* M 10/03 Lesson 6: [[Comprehension, Generators, Functions and Methods]]&lt;br /&gt;
* T 13/03 [https://evaluering.dtu.dk/ Midterm evaluation] - part 2&lt;br /&gt;
* M 17/03 Lesson 7: [[Classes]]&lt;br /&gt;
* T 20/03 More lecture on [[Classes]], Project introduction&lt;br /&gt;
* M 24/03 Lesson 8: [[Unit test]] and start of project&lt;br /&gt;
* T 27/03 More lecture on [[Unit test]], project work&lt;br /&gt;
* M 31/03 Lesson 9: [[Scientific Libraries, Pandas, Numpy]]&lt;br /&gt;
* T 03/04 More lecture on [[Scientific Libraries, Pandas, Numpy]], project work&lt;br /&gt;
* M 07/04 Lesson 11: [[Scientific Libraries, Statistics]] SciPy&lt;br /&gt;
* T 10/04 Continuing lesson, project work&lt;br /&gt;
* Easter holidays - &amp;lt;span style=&amp;quot;color: red;&amp;quot;&amp;gt;Easter messes with the lesson order&amp;lt;/span&amp;gt;&lt;br /&gt;
* T 24/04 Lesson 10: [[Runtime evaluation of algorithms]]&lt;br /&gt;
* M 28/04 Lesson 12: [[Scientific Libraries, Plotting]] Matplotlib, Seaborn&lt;br /&gt;
* T 01/05 Continuing lesson, project work&lt;br /&gt;
* M 05/05 Lesson 13, [[Last words]], Biopython&lt;br /&gt;
* T 08/05 Q/A session, project work&lt;br /&gt;
* M 12/05 The project is handed in at 15.00 on DTU Learn&lt;br /&gt;
* T 13/05 Start of project evaluation&lt;br /&gt;
* M 20/05 The project evaluation is handed in at 15.00 on DTU Learn&lt;br /&gt;
* T 15/05 Exam - [http://eksamensplan.dtu.dk/ the official DTU exam plan]&lt;br /&gt;
&lt;br /&gt;
== Curious about the exam ==&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/22113/index.php/Aligning_expectations#Passing_the_course Read this], and if you fail or are afraid of failing [https://teaching.healthtech.dtu.dk/22113/index.php/Aligning_expectations#Failing_the_course read here].&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Programme_-_Spring_2024&amp;diff=106</id>
		<title>Programme - Spring 2024</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Programme_-_Spring_2024&amp;diff=106"/>
		<updated>2025-01-17T15:12:07Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;&amp;#039;&amp;#039;&amp;#039;Collection of files&amp;#039;&amp;#039;&amp;#039; used in the exercises and lessons - all gathered here.  * M 29/01 Lesson 1: Unix * T 01/02 More lecture on Unix * M 05/02 Lesson 2: Python Recap and Objects * T 08/02 Official talk about Random numbers * M 12/02 Lesson 3: Regular Expressions * T 15/02 Continuing lesson * M 19/02 Lesson 4: Making Functions * T 22/02 Unofficial talk about Garbage Collection in Python * M 26/02 Lesson 5: Advanced Data Structures and New Da...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;[[Collection of files]]&#039;&#039;&#039; used in the exercises and lessons - all gathered here.&lt;br /&gt;
&lt;br /&gt;
* M 29/01 Lesson 1: [[Unix]]&lt;br /&gt;
* T 01/02 More lecture on [[Unix]]&lt;br /&gt;
* M 05/02 Lesson 2: [[Python Recap and Objects]]&lt;br /&gt;
* T 08/02 Official talk about Random numbers&lt;br /&gt;
* M 12/02 Lesson 3: [[Regular Expressions]]&lt;br /&gt;
* T 15/02 Continuing lesson&lt;br /&gt;
* M 19/02 Lesson 4: [[Making Functions]]&lt;br /&gt;
* T 22/02 Unofficial talk about Garbage Collection in Python&lt;br /&gt;
* M 26/02 Lesson 5: [[Advanced Data Structures and New Data Types]]&lt;br /&gt;
* T 29/02 Unofficial talk about how Data Structures work in Python. [https://evaluering.dtu.dk/ Midterm evaluation] - part 1 (this is an evaluation of the course, not of you)&lt;br /&gt;
* M 04/03 Lesson 6: [[Comprehension, Generators, Functions and Methods]]&lt;br /&gt;
* T 07/03 [https://evaluering.dtu.dk/ Midterm evaluation] - part 2&lt;br /&gt;
* M 11/03 Lesson 7: [[Classes]]&lt;br /&gt;
* T 14/03 More lecture on [[Classes]], Project introduction&lt;br /&gt;
* M 18/03 Lesson 8: [[Unit test]] and start of project&lt;br /&gt;
* T 21/03 More lecture on [[Unit test]], project work&lt;br /&gt;
* Easter holidays - &amp;lt;span style=&amp;quot;color: red;&amp;quot;&amp;gt;Easter messes with the lesson order&amp;lt;/span&amp;gt;&lt;br /&gt;
* T 04/04 Lesson 10: [[Runtime evaluation of algorithms]]&lt;br /&gt;
* M 08/04 Lesson 9: [[Scientific Libraries, Pandas, Numpy]]&lt;br /&gt;
* T 11/04 More lecture on [[Scientific Libraries, Pandas, Numpy]], project work&lt;br /&gt;
* M 15/04 Lesson 11: [[Scientific Libraries, Statistics]] SciPy&lt;br /&gt;
* T 18/04 Continuing lesson, project work&lt;br /&gt;
* M 22/04 Lesson 12: [[Scientific Libraries, Plotting]] Matplotlib, Seaborn&lt;br /&gt;
* T 25/04 Continuing lesson, project work&lt;br /&gt;
* M 29/04 Lesson 13, [[Last words]], Biopython&lt;br /&gt;
* T 02/05 Q/A session, project work&lt;br /&gt;
* M 06/05 The project is handed in at 15.00 on DTU Learn&lt;br /&gt;
* T 07/05 Start of project evaluation&lt;br /&gt;
* M 13/05 The project evaluation is handed in at 15.00 on DTU Learn&lt;br /&gt;
* T 15/05 Exam - [http://eksamensplan.dtu.dk/ the official DTU exam plan]&lt;br /&gt;
&lt;br /&gt;
== Curious about the exam ==&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/22113/index.php/Aligning_expectations#Passing_the_course Read this], and if you fail or are afraid of failing [https://teaching.healthtech.dtu.dk/22113/index.php/Aligning_expectations#Failing_the_course read here].&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=22113/22163_-_Unix_%26_Python_Programming_for_Bioinformaticians&amp;diff=105</id>
		<title>22113/22163 - Unix &amp; Python Programming for Bioinformaticians</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=22113/22163_-_Unix_%26_Python_Programming_for_Bioinformaticians&amp;diff=105"/>
		<updated>2025-01-17T15:11:37Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Archive of old course programmes */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
== Prepare for the course ==&lt;br /&gt;
You must read and follow the [[Course preparation]] before the you show up on the first day of the course.&amp;lt;br&amp;gt;&lt;br /&gt;
You are &#039;&#039;&#039;required&#039;&#039;&#039; to read at least the first part of [[Aligning expectations]] when the course starts and whenever you have a question related to the conduction of the course.&amp;lt;br&amp;gt;&lt;br /&gt;
Resources can be good to check out during the course, or when you need something more.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Teacher:&#039;&#039;&#039; [https://www.inside.dtu.dk/da/dtuinside/generelt/telefonbog/person?id=816&amp;amp;cpid=214027&amp;amp;tab=2&amp;amp;qt=dtupublicationquery Peter Wad Sackett], pwsa@dtu.dk&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Language:&#039;&#039;&#039; The course is taught in English.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tools:&#039;&#039;&#039; There is [[Course preparation]].&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Textbooks:&#039;&#039;&#039; There are no text books for the course. I will make do with powerpoints and references to online resources. You can find the material under the individual lessons in the [[programme]].&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Location:&#039;&#039;&#039; Building 116, aud. 82 &amp;lt;span style=&amp;quot;color:red&amp;quot;&amp;gt;NOTICE THIS LOCATION CHANGE&amp;lt;/span&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; Monday 13:00 - 17:00, Thursday 9:00 - 12:00, module F2-A and F2-B.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Deadlines for project work and exam:&#039;&#039;&#039; See the [[Programme]].&lt;br /&gt;
&lt;br /&gt;
== Course details ==&lt;br /&gt;
There are no plans for streaming the lectures as there already are recorded video lectures for first half of the course. Discord is used for online help and discussion - if necessary.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
| [[Programme]] || Spring 2024&lt;br /&gt;
|-&lt;br /&gt;
| [[Aligning expectations]] || Required reading&lt;br /&gt;
|-&lt;br /&gt;
| [[Code construction]] || Required reading for peer evaluation&lt;br /&gt;
|-&lt;br /&gt;
| [[Project list]] || Of projects to do&lt;br /&gt;
|-&lt;br /&gt;
| [[Mini projects]] || For practicing programming&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
[https://docs.google.com/spreadsheets/d/1wEs2xS-7DmpMvtosweTzYw0Fx3_XCbz4k5cuN5tJVK8/edit?usp=sharing Put yourself on the Get Help list]&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&#039;&#039;&#039;Unix/Linux&#039;&#039;&#039;&lt;br /&gt;
* Youtube: [https://www.youtube.com/watch?v=HbgzrKJvDRw Linux File System/Structure Explained]&lt;br /&gt;
* Youtube: [https://www.youtube.com/watch?v=wBp0Rb-ZJak The Complete Linux Course: Beginner to Power User]&lt;br /&gt;
* Youtube: [https://www.youtube.com/playlist?list=PLIhvC56v63IJIujb5cyE13oLuyORZpdkL Linux series] by the very entertaining NetworkChuck&lt;br /&gt;
* Online: [http://www.oliverelliott.org/article/computing/tut_unix/ Online tutorial on unix]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Python 3&#039;&#039;&#039;&lt;br /&gt;
* Online: [https://www.coursera.org/learn/python Coursera course: Programming for Everybody] is a beginner course in Python. Everyone who wants to prepare more for course 22113 can start here. The [https://teaching.healthtech.dtu.dk/material/22113/CourseraPythonBook_270.pdf Coursera textbook]&lt;br /&gt;
* Online: [https://pynative.com/ PYnative] Good site for learning about Python. Information, tutorials, exercises and even online editor, all well explained in an accessible way.&lt;br /&gt;
* Youtube: [https://www.youtube.com/watch?v=rfscVS0vtbw Python beginner course]&lt;br /&gt;
* Online: [https://teaching.healthtech.dtu.dk/material/22113/clean_code.html Clean Code] by Lukasz Dynowski. An amazing read that is mandatory. Read it once around lesson 3 and once more around lesson 6.&lt;br /&gt;
* Online: [https://rosalind.info/problems/locations/ Rosalind project] Python exercises at different levels for practicing &lt;br /&gt;
* Book: &#039;&#039;Learning Python&#039;&#039;, 5th ed. by Mark Lutz (O&#039;Reilly) ISBN: 978-1-449-35573-9. This is the best Python book I have read. It covers all the basics and then some. All from the perspective of being a novice programmer. However, it is a brick; big, heavy and unwieldy. If you only want one Python book, then this should be the one. The course will not be taught from this book, but it could be good to have as a Python reference manual.&lt;br /&gt;
* Book: &#039;&#039;Python Crash Course: A Hands-On, Project-Based Introduction to Programming&#039;&#039; by Eric Matthes (No Starch Press) ISBN: 1593276036, 9781593276034. A pretty OK book which leads you into the Python world without too many distracting points and theoretical contemplation.&lt;br /&gt;
* Online: [https://docs.python.org/3/tutorial/ Official Python 3 tutorial]&lt;br /&gt;
* Online: [https://docs.python.org/3/reference/index.html Python 3 reference manual]&lt;br /&gt;
* Online: [https://docs.python.org/3/library/index.html Python 3 standard library]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Biological&#039;&#039;&#039;&lt;br /&gt;
* Info: [[Biological knowledge needed in the course]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Writing reports, articles, thesis at university level&#039;&#039;&#039;&lt;br /&gt;
* PDF: [https://teaching.healthtech.dtu.dk/material/22113/Vejledning_i_opbygning_og_skrivning_af_rapporter_v.2013.3.pdf Vejledning i opbygning og skrivning af rapporter]&lt;br /&gt;
* PDF: [https://teaching.healthtech.dtu.dk/material/22113/how_to_write_reports.pdf Guide on how to write a report]&lt;br /&gt;
* Online: [https://www.elsevier.com/connect/11-steps-to-structuring-a-science-paper-editors-will-take-seriously How to structure a science paper]&lt;br /&gt;
* PDF: [https://www.imm.dtu.dk/~janba/MastersThesisAdvice.pdf Master Thesis Advice]&lt;br /&gt;
* Online: [https://thereader.mitpress.mit.edu/umberto-eco-how-to-write-a-thesis/ How to write a thesis]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Interesting but less teaching oriented material&#039;&#039;&#039;&lt;br /&gt;
* Blog: [https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/ Why rewriting software projects can be bad]&lt;br /&gt;
* Online: [http://ivory.idyll.org/blog/big-data-biology.html Top 12 reasons you know you are a Big Data biologist]&lt;br /&gt;
* Online: [http://lifehacker.com/six-life-lessons-ive-learned-from-programming-1502077380 How programming and your life is similar]&lt;br /&gt;
* Youtube: [http://www.youtube.com/watch?v=nKIu9yen5nc What most schools don&#039;t teach - how to think]&lt;br /&gt;
&lt;br /&gt;
== Archive of old course programmes ==&lt;br /&gt;
[[Programme - Spring 2023]]&amp;lt;br&amp;gt;&lt;br /&gt;
[[Programme - Spring 2024]]&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=22113/22163_-_Unix_%26_Python_Programming_for_Bioinformaticians&amp;diff=104</id>
		<title>22113/22163 - Unix &amp; Python Programming for Bioinformaticians</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=22113/22163_-_Unix_%26_Python_Programming_for_Bioinformaticians&amp;diff=104"/>
		<updated>2025-01-17T15:11:23Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Archive of old course programmes */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
== Prepare for the course ==&lt;br /&gt;
You must read and follow the [[Course preparation]] before the you show up on the first day of the course.&amp;lt;br&amp;gt;&lt;br /&gt;
You are &#039;&#039;&#039;required&#039;&#039;&#039; to read at least the first part of [[Aligning expectations]] when the course starts and whenever you have a question related to the conduction of the course.&amp;lt;br&amp;gt;&lt;br /&gt;
Resources can be good to check out during the course, or when you need something more.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Teacher:&#039;&#039;&#039; [https://www.inside.dtu.dk/da/dtuinside/generelt/telefonbog/person?id=816&amp;amp;cpid=214027&amp;amp;tab=2&amp;amp;qt=dtupublicationquery Peter Wad Sackett], pwsa@dtu.dk&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Language:&#039;&#039;&#039; The course is taught in English.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tools:&#039;&#039;&#039; There is [[Course preparation]].&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Textbooks:&#039;&#039;&#039; There are no text books for the course. I will make do with powerpoints and references to online resources. You can find the material under the individual lessons in the [[programme]].&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Location:&#039;&#039;&#039; Building 116, aud. 82 &amp;lt;span style=&amp;quot;color:red&amp;quot;&amp;gt;NOTICE THIS LOCATION CHANGE&amp;lt;/span&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; Monday 13:00 - 17:00, Thursday 9:00 - 12:00, module F2-A and F2-B.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Deadlines for project work and exam:&#039;&#039;&#039; See the [[Programme]].&lt;br /&gt;
&lt;br /&gt;
== Course details ==&lt;br /&gt;
There are no plans for streaming the lectures as there already are recorded video lectures for first half of the course. Discord is used for online help and discussion - if necessary.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
| [[Programme]] || Spring 2024&lt;br /&gt;
|-&lt;br /&gt;
| [[Aligning expectations]] || Required reading&lt;br /&gt;
|-&lt;br /&gt;
| [[Code construction]] || Required reading for peer evaluation&lt;br /&gt;
|-&lt;br /&gt;
| [[Project list]] || Of projects to do&lt;br /&gt;
|-&lt;br /&gt;
| [[Mini projects]] || For practicing programming&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
[https://docs.google.com/spreadsheets/d/1wEs2xS-7DmpMvtosweTzYw0Fx3_XCbz4k5cuN5tJVK8/edit?usp=sharing Put yourself on the Get Help list]&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&#039;&#039;&#039;Unix/Linux&#039;&#039;&#039;&lt;br /&gt;
* Youtube: [https://www.youtube.com/watch?v=HbgzrKJvDRw Linux File System/Structure Explained]&lt;br /&gt;
* Youtube: [https://www.youtube.com/watch?v=wBp0Rb-ZJak The Complete Linux Course: Beginner to Power User]&lt;br /&gt;
* Youtube: [https://www.youtube.com/playlist?list=PLIhvC56v63IJIujb5cyE13oLuyORZpdkL Linux series] by the very entertaining NetworkChuck&lt;br /&gt;
* Online: [http://www.oliverelliott.org/article/computing/tut_unix/ Online tutorial on unix]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Python 3&#039;&#039;&#039;&lt;br /&gt;
* Online: [https://www.coursera.org/learn/python Coursera course: Programming for Everybody] is a beginner course in Python. Everyone who wants to prepare more for course 22113 can start here. The [https://teaching.healthtech.dtu.dk/material/22113/CourseraPythonBook_270.pdf Coursera textbook]&lt;br /&gt;
* Online: [https://pynative.com/ PYnative] Good site for learning about Python. Information, tutorials, exercises and even online editor, all well explained in an accessible way.&lt;br /&gt;
* Youtube: [https://www.youtube.com/watch?v=rfscVS0vtbw Python beginner course]&lt;br /&gt;
* Online: [https://teaching.healthtech.dtu.dk/material/22113/clean_code.html Clean Code] by Lukasz Dynowski. An amazing read that is mandatory. Read it once around lesson 3 and once more around lesson 6.&lt;br /&gt;
* Online: [https://rosalind.info/problems/locations/ Rosalind project] Python exercises at different levels for practicing &lt;br /&gt;
* Book: &#039;&#039;Learning Python&#039;&#039;, 5th ed. by Mark Lutz (O&#039;Reilly) ISBN: 978-1-449-35573-9. This is the best Python book I have read. It covers all the basics and then some. All from the perspective of being a novice programmer. However, it is a brick; big, heavy and unwieldy. If you only want one Python book, then this should be the one. The course will not be taught from this book, but it could be good to have as a Python reference manual.&lt;br /&gt;
* Book: &#039;&#039;Python Crash Course: A Hands-On, Project-Based Introduction to Programming&#039;&#039; by Eric Matthes (No Starch Press) ISBN: 1593276036, 9781593276034. A pretty OK book which leads you into the Python world without too many distracting points and theoretical contemplation.&lt;br /&gt;
* Online: [https://docs.python.org/3/tutorial/ Official Python 3 tutorial]&lt;br /&gt;
* Online: [https://docs.python.org/3/reference/index.html Python 3 reference manual]&lt;br /&gt;
* Online: [https://docs.python.org/3/library/index.html Python 3 standard library]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Biological&#039;&#039;&#039;&lt;br /&gt;
* Info: [[Biological knowledge needed in the course]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Writing reports, articles, thesis at university level&#039;&#039;&#039;&lt;br /&gt;
* PDF: [https://teaching.healthtech.dtu.dk/material/22113/Vejledning_i_opbygning_og_skrivning_af_rapporter_v.2013.3.pdf Vejledning i opbygning og skrivning af rapporter]&lt;br /&gt;
* PDF: [https://teaching.healthtech.dtu.dk/material/22113/how_to_write_reports.pdf Guide on how to write a report]&lt;br /&gt;
* Online: [https://www.elsevier.com/connect/11-steps-to-structuring-a-science-paper-editors-will-take-seriously How to structure a science paper]&lt;br /&gt;
* PDF: [https://www.imm.dtu.dk/~janba/MastersThesisAdvice.pdf Master Thesis Advice]&lt;br /&gt;
* Online: [https://thereader.mitpress.mit.edu/umberto-eco-how-to-write-a-thesis/ How to write a thesis]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Interesting but less teaching oriented material&#039;&#039;&#039;&lt;br /&gt;
* Blog: [https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/ Why rewriting software projects can be bad]&lt;br /&gt;
* Online: [http://ivory.idyll.org/blog/big-data-biology.html Top 12 reasons you know you are a Big Data biologist]&lt;br /&gt;
* Online: [http://lifehacker.com/six-life-lessons-ive-learned-from-programming-1502077380 How programming and your life is similar]&lt;br /&gt;
* Youtube: [http://www.youtube.com/watch?v=nKIu9yen5nc What most schools don&#039;t teach - how to think]&lt;br /&gt;
&lt;br /&gt;
== Archive of old course programmes ==&lt;br /&gt;
[[Programme - Spring 2023]]&lt;br /&gt;
[[Programme - Spring 2024]]&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Programme&amp;diff=103</id>
		<title>Programme</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Programme&amp;diff=103"/>
		<updated>2024-11-19T10:10:21Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Curious about the exam */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;[[Collection of files]]&#039;&#039;&#039; used in the exercises and lessons - all gathered here.&lt;br /&gt;
&lt;br /&gt;
* M 29/01 Lesson 1: [[Unix]]&lt;br /&gt;
* T 01/02 More lecture on [[Unix]]&lt;br /&gt;
* M 05/02 Lesson 2: [[Python Recap and Objects]]&lt;br /&gt;
* T 08/02 Official talk about Random numbers&lt;br /&gt;
* M 12/02 Lesson 3: [[Regular Expressions]]&lt;br /&gt;
* T 15/02 Continuing lesson&lt;br /&gt;
* M 19/02 Lesson 4: [[Making Functions]]&lt;br /&gt;
* T 22/02 Unofficial talk about Garbage Collection in Python&lt;br /&gt;
* M 26/02 Lesson 5: [[Advanced Data Structures and New Data Types]]&lt;br /&gt;
* T 29/02 Unofficial talk about how Data Structures work in Python. [https://evaluering.dtu.dk/ Midterm evaluation] - part 1 (this is an evaluation of the course, not of you)&lt;br /&gt;
* M 04/03 Lesson 6: [[Comprehension, Generators, Functions and Methods]]&lt;br /&gt;
* T 07/03 [https://evaluering.dtu.dk/ Midterm evaluation] - part 2&lt;br /&gt;
* M 11/03 Lesson 7: [[Classes]]&lt;br /&gt;
* T 14/03 More lecture on [[Classes]], Project introduction&lt;br /&gt;
* M 18/03 Lesson 8: [[Unit test]] and start of project&lt;br /&gt;
* T 21/03 More lecture on [[Unit test]], project work&lt;br /&gt;
* Easter holidays - &amp;lt;span style=&amp;quot;color: red;&amp;quot;&amp;gt;Easter messes with the lesson order&amp;lt;/span&amp;gt;&lt;br /&gt;
* T 04/04 Lesson 10: [[Runtime evaluation of algorithms]]&lt;br /&gt;
* M 08/04 Lesson 9: [[Scientific Libraries, Pandas, Numpy]]&lt;br /&gt;
* T 11/04 More lecture on [[Scientific Libraries, Pandas, Numpy]], project work&lt;br /&gt;
* M 15/04 Lesson 11: [[Scientific Libraries, Statistics]] SciPy&lt;br /&gt;
* T 18/04 Continuing lesson, project work&lt;br /&gt;
* M 22/04 Lesson 12: [[Scientific Libraries, Plotting]] Matplotlib, Seaborn&lt;br /&gt;
* T 25/04 Continuing lesson, project work&lt;br /&gt;
* M 29/04 Lesson 13, [[Last words]], Biopython&lt;br /&gt;
* T 02/05 Q/A session, project work&lt;br /&gt;
* M 06/05 The project is handed in at 15.00 on DTU Learn&lt;br /&gt;
* T 07/05 Start of project evaluation&lt;br /&gt;
* M 13/05 The project evaluation is handed in at 15.00 on DTU Learn&lt;br /&gt;
* T 15/05 Exam - [http://eksamensplan.dtu.dk/ the official DTU exam plan]&lt;br /&gt;
&lt;br /&gt;
== Curious about the exam ==&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/22113/index.php/Aligning_expectations#Passing_the_course Read this], and if you fail or are afraid of failing [https://teaching.healthtech.dtu.dk/22113/index.php/Aligning_expectations#Failing_the_course read here].&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Programme&amp;diff=102</id>
		<title>Programme</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Programme&amp;diff=102"/>
		<updated>2024-11-19T10:09:40Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;[[Collection of files]]&#039;&#039;&#039; used in the exercises and lessons - all gathered here.&lt;br /&gt;
&lt;br /&gt;
* M 29/01 Lesson 1: [[Unix]]&lt;br /&gt;
* T 01/02 More lecture on [[Unix]]&lt;br /&gt;
* M 05/02 Lesson 2: [[Python Recap and Objects]]&lt;br /&gt;
* T 08/02 Official talk about Random numbers&lt;br /&gt;
* M 12/02 Lesson 3: [[Regular Expressions]]&lt;br /&gt;
* T 15/02 Continuing lesson&lt;br /&gt;
* M 19/02 Lesson 4: [[Making Functions]]&lt;br /&gt;
* T 22/02 Unofficial talk about Garbage Collection in Python&lt;br /&gt;
* M 26/02 Lesson 5: [[Advanced Data Structures and New Data Types]]&lt;br /&gt;
* T 29/02 Unofficial talk about how Data Structures work in Python. [https://evaluering.dtu.dk/ Midterm evaluation] - part 1 (this is an evaluation of the course, not of you)&lt;br /&gt;
* M 04/03 Lesson 6: [[Comprehension, Generators, Functions and Methods]]&lt;br /&gt;
* T 07/03 [https://evaluering.dtu.dk/ Midterm evaluation] - part 2&lt;br /&gt;
* M 11/03 Lesson 7: [[Classes]]&lt;br /&gt;
* T 14/03 More lecture on [[Classes]], Project introduction&lt;br /&gt;
* M 18/03 Lesson 8: [[Unit test]] and start of project&lt;br /&gt;
* T 21/03 More lecture on [[Unit test]], project work&lt;br /&gt;
* Easter holidays - &amp;lt;span style=&amp;quot;color: red;&amp;quot;&amp;gt;Easter messes with the lesson order&amp;lt;/span&amp;gt;&lt;br /&gt;
* T 04/04 Lesson 10: [[Runtime evaluation of algorithms]]&lt;br /&gt;
* M 08/04 Lesson 9: [[Scientific Libraries, Pandas, Numpy]]&lt;br /&gt;
* T 11/04 More lecture on [[Scientific Libraries, Pandas, Numpy]], project work&lt;br /&gt;
* M 15/04 Lesson 11: [[Scientific Libraries, Statistics]] SciPy&lt;br /&gt;
* T 18/04 Continuing lesson, project work&lt;br /&gt;
* M 22/04 Lesson 12: [[Scientific Libraries, Plotting]] Matplotlib, Seaborn&lt;br /&gt;
* T 25/04 Continuing lesson, project work&lt;br /&gt;
* M 29/04 Lesson 13, [[Last words]], Biopython&lt;br /&gt;
* T 02/05 Q/A session, project work&lt;br /&gt;
* M 06/05 The project is handed in at 15.00 on DTU Learn&lt;br /&gt;
* T 07/05 Start of project evaluation&lt;br /&gt;
* M 13/05 The project evaluation is handed in at 15.00 on DTU Learn&lt;br /&gt;
* T 15/05 Exam - [http://eksamensplan.dtu.dk/ the official DTU exam plan]&lt;br /&gt;
&lt;br /&gt;
== Curious about the exam ==&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/22113/index.php/Aligning_expectations#Passing_the_course Read this] and if you fail or are afraid of failing [https://teaching.healthtech.dtu.dk/22113/index.php/Aligning_expectations#Failing_the_course read here].&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Programme&amp;diff=101</id>
		<title>Programme</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Programme&amp;diff=101"/>
		<updated>2024-11-19T10:08:25Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;[[Collection of files]]&#039;&#039;&#039; used in the exercises and lessons - all gathered here.&lt;br /&gt;
&lt;br /&gt;
* M 29/01 Lesson 1: [[Unix]]&lt;br /&gt;
* T 01/02 More lecture on [[Unix]]&lt;br /&gt;
* M 05/02 Lesson 2: [[Python Recap and Objects]]&lt;br /&gt;
* T 08/02 Official talk about Random numbers&lt;br /&gt;
* M 12/02 Lesson 3: [[Regular Expressions]]&lt;br /&gt;
* T 15/02 Continuing lesson&lt;br /&gt;
* M 19/02 Lesson 4: [[Making Functions]]&lt;br /&gt;
* T 22/02 Unofficial talk about Garbage Collection in Python&lt;br /&gt;
* M 26/02 Lesson 5: [[Advanced Data Structures and New Data Types]]&lt;br /&gt;
* T 29/02 Unofficial talk about how Data Structures work in Python. [https://evaluering.dtu.dk/ Midterm evaluation] - part 1 (this is an evaluation of the course, not of you)&lt;br /&gt;
* M 04/03 Lesson 6: [[Comprehension, Generators, Functions and Methods]]&lt;br /&gt;
* T 07/03 [https://evaluering.dtu.dk/ Midterm evaluation] - part 2&lt;br /&gt;
* M 11/03 Lesson 7: [[Classes]]&lt;br /&gt;
* T 14/03 More lecture on [[Classes]], Project introduction&lt;br /&gt;
* M 18/03 Lesson 8: [[Unit test]] and start of project&lt;br /&gt;
* T 21/03 More lecture on [[Unit test]], project work&lt;br /&gt;
* Easter holidays - &amp;lt;span style=&amp;quot;color: red;&amp;quot;&amp;gt;Easter messes with the lesson order&amp;lt;/span&amp;gt;&lt;br /&gt;
* T 04/04 Lesson 10: [[Runtime evaluation of algorithms]]&lt;br /&gt;
* M 08/04 Lesson 9: [[Scientific Libraries, Pandas, Numpy]]&lt;br /&gt;
* T 11/04 More lecture on [[Scientific Libraries, Pandas, Numpy]], project work&lt;br /&gt;
* M 15/04 Lesson 11: [[Scientific Libraries, Statistics]] SciPy&lt;br /&gt;
* T 18/04 Continuing lesson, project work&lt;br /&gt;
* M 22/04 Lesson 12: [[Scientific Libraries, Plotting]] Matplotlib, Seaborn&lt;br /&gt;
* T 25/04 Continuing lesson, project work&lt;br /&gt;
* M 29/04 Lesson 13, [[Last words]], Biopython&lt;br /&gt;
* T 02/05 Q/A session, project work&lt;br /&gt;
* M 06/05 The project is handed in at 15.00 on DTU Learn&lt;br /&gt;
* T 07/05 Start of project evaluation&lt;br /&gt;
* M 13/05 The project evaluation is handed in at 15.00 on DTU Learn&lt;br /&gt;
* T 15/05 Exam - [http://eksamensplan.dtu.dk/ the official DTU exam plan]&lt;br /&gt;
&lt;br /&gt;
== Curious about the exam ==&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/22113/index.php/Aligning_expectations#Passing_the_course Read this] and if you fail or are afraid of failing [https://teaching.healthtech.dtu.dk/22113/index.php/Aligning_expectations#Failing_the_exam read here].&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Course_preparation&amp;diff=100</id>
		<title>Course preparation</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Course_preparation&amp;diff=100"/>
		<updated>2024-09-02T17:04:06Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* WSL/WSL2, Windows Subsystem for Linux */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Before you meet up on the first day of the course, you will &#039;&#039;&#039;have&#039;&#039;&#039; to be able to start a terminal in a &#039;&#039;&#039;unix&#039;&#039;&#039; environment, and to &#039;&#039;&#039;edit&#039;&#039;&#039; files in the same environment with a text editor.&lt;br /&gt;
Here is how and what you should do on various Operation Systems (OS). Anything mentioned here is free software.&lt;br /&gt;
&lt;br /&gt;
== General remarks about programming environments, IDE&#039;s ==&lt;br /&gt;
Some people have discovered tools like Spyder, Jupyter or PyCharm, which creates an environment for making and running programs. The use of these tools is somewhat discouraged in this course. Part of what the course/teacher wishes to teach is how to &#039;&#039;&#039;confidently and with experience&#039;&#039;&#039; use Unix as a working environment, which means how to work with the Unix file system, the shell, using text editors to write programs and the Unix commands to test/execute them. Using an IDE removes this element of the course.&amp;lt;br&amp;gt;&lt;br /&gt;
Why is Unix as a work environment important?&amp;lt;br&amp;gt;&lt;br /&gt;
* Servers (big computers) do not always offer an IDE.&lt;br /&gt;
* When starting to use big data, the IDE will hamper you - even prevent you from succeeding.&lt;br /&gt;
* Unix is the de-facto work environment in Life Science.&lt;br /&gt;
If you already know how to work with Unix (i.e. years of experience), then you can use an IDE as you please.&lt;br /&gt;
&lt;br /&gt;
== Windows ==&lt;br /&gt;
There are a number of solutions for Windows, but you should really go for the first one described, WSL2. Otherwise you are on your own.&lt;br /&gt;
&lt;br /&gt;
=== WSL/WSL2, Windows Subsystem for Linux ===&lt;br /&gt;
I have made a rather comprehensive guide for WSL2:&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/22113_01-WSL2Install.ppt Guide for WSL2 and Linux]&lt;br /&gt;
&lt;br /&gt;
Now install the Anaconda Python in your WSL2. The WSL2 linux you installed already has a Python, but we need some more libraries later in the course.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/22113_01-AnacondaInstall.ppt Anaconda install] for WSL2 and Linux&lt;br /&gt;
&lt;br /&gt;
The rest here is just other guides, which may help you if stuff does not work out for you.&lt;br /&gt;
&lt;br /&gt;
WSL2 using [https://pureinfotech.com/install-windows-subsystem-linux-2-windows-10/ this guide] ([https://www.youtube.com/watch?v=n-J9438Mv-s video guide]) for an other experience. If you already have WSL (the old version), then it will work, too.&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://learn.microsoft.com/en-us/windows/wsl/install Official Windows guide to installing WSL2]. [https://www.omgubuntu.co.uk/how-to-install-wsl2-on-windows-10 Another guide].&amp;lt;br&amp;gt;&lt;br /&gt;
I have seen some update on newer systems. If in trouble, [https://learn.microsoft.com/en-us/windows/wsl/install-manual See here.]&lt;br /&gt;
&lt;br /&gt;
When successfully done, follow these two guides, to make life convenient (first one is mandatory).&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=88cbc7c3-846c-42b3-a14e-af270126ce25 Sharing files between Ubuntu (WSL2) and Windows]&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=dacb6779-7dfb-430a-a9e6-af270126ac74 Making X11 (Linux graphics) work on WSL2]&lt;br /&gt;
&lt;br /&gt;
=== MobaXterm, not really recommended ===&lt;br /&gt;
Download [http://mobaxterm.mobatek.net/download-home-edition.html MobaXterm]. Warning: You are likely to get problem with Python and/or libraries.&amp;lt;br&amp;gt;&lt;br /&gt;
Install MobaXterm - it will put itself in the windows&lt;br /&gt;
program folder. This location differs depending on the version of windows you have, but it is &amp;quot;findable&amp;quot;.&lt;br /&gt;
Some places would be: &#039;&#039;Computer -&amp;gt; C: -&amp;gt; Programs -&amp;gt; Mobatek -&amp;gt; MobaXterm Home Edition&#039;&#039;. Instead of&lt;br /&gt;
&#039;&#039;Programs&#039;&#039; it could be; &#039;&#039;Programmer&#039;&#039;, &#039;&#039;Programs (x86)&#039;&#039; or other.&lt;br /&gt;
The point of knowing this location is that you should drag and drop (move)&lt;br /&gt;
the plugins you downloaded to this folder from your download folder. That act will make the plugin available.&lt;br /&gt;
Windows will ask if you want to do this - yes, you do. Windows Firewall will block some features of MobaXterm&lt;br /&gt;
and ask if they should be allowed. Just block - it does not matter for what we do.&lt;br /&gt;
&lt;br /&gt;
A problem with MobaXterm is that part of the installation is put in a folder that is on OneDrive, if you have OneDrive enabled.&lt;br /&gt;
This means that MobaXterm only works when you have an internet connection and is slow also because part of it is loaded from OneDrive.&lt;br /&gt;
&lt;br /&gt;
The first time MobaXterm starts takes a while - get coffee - be patient.&amp;lt;br&amp;gt;&lt;br /&gt;
If MobaXterm does not want run properly - it is likely NOT an installation problem and re-installing won&#039;t help. Instead delete the &amp;quot;MobaXterm&amp;quot; folder in your standard &amp;quot;Documents&amp;quot; folder. This is where files and settings are stored.&lt;br /&gt;
&lt;br /&gt;
=== Virtual Box, Visualizing Linux, not recommended for the course ===&lt;br /&gt;
VirtualBox from Oracle is a wonderful tool. It installs a package that will allow you to run one or more virtual machines on your computer. On these virtual machines you can install any OS you want, see Linux or Windows. Examples: You have a Mac, you want to run Windows - use VirtualBox and you can run Windows in a Mac application window. You run Windows, but would like to run linux for some specific purpose - same answer.&lt;br /&gt;
If you don&#039;t need your virtual machine (VM) anymore - throw it away and release the disk space for some other purpose. A virtual machine does not need much disk space (5-10 GB), since it can access the disk on the native machine. You can simply share files between your host machine and virtual machine. You can even copy/paste between them, once you have installed the VBoxGuest additions. There is approximately a 10% performance loss when running virtual, but it is worth it for the ease of use. There are other free virtualizing softwares, like VMware Player (one of the first softwares on that market and still very strong), but VirtualBox has proved itself to be small package that is very easy to use and install. No support from CBS will be available on anything but VirtualBox.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://www.virtualbox.org/wiki/Downloads Download VirtualBox for Windows, Mac and Linux versions].&amp;lt;br&amp;gt;&lt;br /&gt;
Installation: Do a standard install. There will be several warnings from Windows about using drivers that has not gone through a Microsoft approval step - these can safely (and must) be ignored - just click Continue. When creating a new virtual machine, you must first decide what to install as a guest OS. The recommended choice is 32 bit Ubuntu linux. In any case you should probably go for 32 bit OSes. Secondly, you must decide how large a disk you should use - the default 8 GB is fine. You must also decide how much memory you should allocate to the VM; if you only have 2 GB RAM on your &amp;quot;real&amp;quot; machine, then allocate 768 MB, if you have more real RAM then allocate 1024-1536 MB. Before you launch your new WM, you must insert the installation image for the OS (Ubuntu) you downloaded into the VM&#039;s CD-rom drive. This is done under &#039;Storage&#039; for the VM - it can be a bit tricky to find the small icon for the CD-drive, but when the standard &amp;quot;choose file&amp;quot; menu opens, then you hit it right.&lt;br /&gt;
After installation of your virtual OS, you must also install the &#039;Guest additions&#039;. These can be found under &#039;Devices&#039; when your guest OS is running. It will give you much better screen control (resizing), faster screen updates, the ability to cut/paste text and share folders between your host OS and guest OS.&lt;br /&gt;
&lt;br /&gt;
== Mac ==&lt;br /&gt;
A Mac has a BSD Unix underneath all the fancy graphics. This means you are mostly ready for the course once you figure out how to use it.&lt;br /&gt;
[https://www.youtube.com/watch?v=8OFD_F5L_vk Here is a very basic video in how to find the terminal], through which we access the Unix operating system. Watch and test on your MAC.&lt;br /&gt;
&lt;br /&gt;
If you took [https://teaching.healthtech.dtu.dk/22101 course 22101/22161] you are done. Otherwise you must install Anaconda.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/22101/index.php/Install_Jupyter_Notebook My guide for windows, but very similar].&amp;lt;br&amp;gt;&lt;br /&gt;
[https://docs.anaconda.com/anaconda/install/mac-os/ Official Anaconda guide].&lt;br /&gt;
&lt;br /&gt;
== Linux, any flavour ==&lt;br /&gt;
You are already set and ready for the course. You should be able to find the terminal; Term, Xterm, Console,&lt;br /&gt;
as this is a basic integrated part of linux.&amp;lt;br&amp;gt;&lt;br /&gt;
There are many editors you can use; gedit, jedit, nedit, emacs, vim and a dozen more, also see later.&amp;lt;br&amp;gt;&lt;br /&gt;
Python is also built-in, but you must install the Anaconda version to get the libraries used in this course if you did not take [https://teaching.healthtech.dtu.dk/22101 course 22101].&amp;lt;br&amp;gt;&lt;br /&gt;
Youtube: [https://www.youtube.com/watch?v=dGm10q_y3xw Installing Anaconda on Ubuntu]&amp;lt;br&amp;gt;&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22101/22101_01-InstallingJupyter.ppt Installing Jupyter/Anaconda] - made for windows but strong similarities&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Editors ==&lt;br /&gt;
It is vitally important that you have a good programming editor on your system. You should REALLY pick one of the 2 top choices.&lt;br /&gt;
&lt;br /&gt;
=== Sublime Text ===&lt;br /&gt;
Extremely popular multi platform editor. Download at [https://www.sublimetext.com/download https://www.sublimetext.com/download].&amp;lt;br&amp;gt;&lt;br /&gt;
[https://www.sublimetext.com/docs/index.html Documentation] and [https://www.youtube.com/c/OdatNurd Youtube tutorials].&lt;br /&gt;
&lt;br /&gt;
=== VScode ===&lt;br /&gt;
Extremely popular multi platform editor. Download at [https://code.visualstudio.com/download https://code.visualstudio.com/download].&amp;lt;br&amp;gt;&lt;br /&gt;
[https://code.visualstudio.com/docs Documentation].&lt;br /&gt;
&lt;br /&gt;
=== Nano/Pico ===&lt;br /&gt;
The &#039;&#039;&#039;nano&#039;&#039;&#039; and the &#039;&#039;&#039;pico&#039;&#039;&#039; editors only work in the Unix terminal window. They are very basic, but fairly intuitive - at least compared to other terminal text editors on Unix. They are good to know and use, if you just want to do something small, or are on a bad/slow network connection. They are very similar and usually only one is installed. To find which one is present, just type:&lt;br /&gt;
 nano myfile.txt&lt;br /&gt;
 pico somefile.txt&lt;br /&gt;
&lt;br /&gt;
=== Other editors ===&lt;br /&gt;
These can be found via Google and downloaded for free.&lt;br /&gt;
* PSpad (Windows)&lt;br /&gt;
* Notepad++ (Windows)&lt;br /&gt;
* Jedit (Multi platform, requires Java)&lt;br /&gt;
* Komodo Edit (Multi platform)&lt;br /&gt;
* Unix GUI editors: gedit, nedit, kate&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Course_preparation&amp;diff=99</id>
		<title>Course preparation</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Course_preparation&amp;diff=99"/>
		<updated>2024-09-02T17:03:40Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* WSL/WSL2, Windows Subsystem for Linux */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Before you meet up on the first day of the course, you will &#039;&#039;&#039;have&#039;&#039;&#039; to be able to start a terminal in a &#039;&#039;&#039;unix&#039;&#039;&#039; environment, and to &#039;&#039;&#039;edit&#039;&#039;&#039; files in the same environment with a text editor.&lt;br /&gt;
Here is how and what you should do on various Operation Systems (OS). Anything mentioned here is free software.&lt;br /&gt;
&lt;br /&gt;
== General remarks about programming environments, IDE&#039;s ==&lt;br /&gt;
Some people have discovered tools like Spyder, Jupyter or PyCharm, which creates an environment for making and running programs. The use of these tools is somewhat discouraged in this course. Part of what the course/teacher wishes to teach is how to &#039;&#039;&#039;confidently and with experience&#039;&#039;&#039; use Unix as a working environment, which means how to work with the Unix file system, the shell, using text editors to write programs and the Unix commands to test/execute them. Using an IDE removes this element of the course.&amp;lt;br&amp;gt;&lt;br /&gt;
Why is Unix as a work environment important?&amp;lt;br&amp;gt;&lt;br /&gt;
* Servers (big computers) do not always offer an IDE.&lt;br /&gt;
* When starting to use big data, the IDE will hamper you - even prevent you from succeeding.&lt;br /&gt;
* Unix is the de-facto work environment in Life Science.&lt;br /&gt;
If you already know how to work with Unix (i.e. years of experience), then you can use an IDE as you please.&lt;br /&gt;
&lt;br /&gt;
== Windows ==&lt;br /&gt;
There are a number of solutions for Windows, but you should really go for the first one described, WSL2. Otherwise you are on your own.&lt;br /&gt;
&lt;br /&gt;
=== WSL/WSL2, Windows Subsystem for Linux ===&lt;br /&gt;
I have made a rather comprehensive guide for WSL2:&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/22113_01-WSL2Install.ppt Guide for WSL2 and Linux]&lt;br /&gt;
&lt;br /&gt;
Now install the Anaconda Python in your WSL2. The WSL2 linux you installed already has a Python, but we need some more libraries later in the course.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/22113_01-AnacondaInstall.ppt Anaconda install] for WSL2 and Linux&lt;br /&gt;
&lt;br /&gt;
The rest here is just other guides, which may help you if stuff does not work out for you.&lt;br /&gt;
&lt;br /&gt;
WSL2 using [https://pureinfotech.com/install-windows-subsystem-linux-2-windows-10/ this guide] ([https://www.youtube.com/watch?v=n-J9438Mv-s video guide]) for a better experience. If you already have WSL (the old version), then it will work, too.&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[https://learn.microsoft.com/en-us/windows/wsl/install Official Windows guide to installing WSL2]. [https://www.omgubuntu.co.uk/how-to-install-wsl2-on-windows-10 Another guide].&amp;lt;br&amp;gt;&lt;br /&gt;
I have seen some update on newer systems. If in trouble, [https://learn.microsoft.com/en-us/windows/wsl/install-manual See here.]&lt;br /&gt;
&lt;br /&gt;
When successfully done, follow these two guides, to make life convenient (first one is mandatory).&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=88cbc7c3-846c-42b3-a14e-af270126ce25 Sharing files between Ubuntu (WSL2) and Windows]&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=dacb6779-7dfb-430a-a9e6-af270126ac74 Making X11 (Linux graphics) work on WSL2]&lt;br /&gt;
&lt;br /&gt;
=== MobaXterm, not really recommended ===&lt;br /&gt;
Download [http://mobaxterm.mobatek.net/download-home-edition.html MobaXterm]. Warning: You are likely to get problem with Python and/or libraries.&amp;lt;br&amp;gt;&lt;br /&gt;
Install MobaXterm - it will put itself in the windows&lt;br /&gt;
program folder. This location differs depending on the version of windows you have, but it is &amp;quot;findable&amp;quot;.&lt;br /&gt;
Some places would be: &#039;&#039;Computer -&amp;gt; C: -&amp;gt; Programs -&amp;gt; Mobatek -&amp;gt; MobaXterm Home Edition&#039;&#039;. Instead of&lt;br /&gt;
&#039;&#039;Programs&#039;&#039; it could be; &#039;&#039;Programmer&#039;&#039;, &#039;&#039;Programs (x86)&#039;&#039; or other.&lt;br /&gt;
The point of knowing this location is that you should drag and drop (move)&lt;br /&gt;
the plugins you downloaded to this folder from your download folder. That act will make the plugin available.&lt;br /&gt;
Windows will ask if you want to do this - yes, you do. Windows Firewall will block some features of MobaXterm&lt;br /&gt;
and ask if they should be allowed. Just block - it does not matter for what we do.&lt;br /&gt;
&lt;br /&gt;
A problem with MobaXterm is that part of the installation is put in a folder that is on OneDrive, if you have OneDrive enabled.&lt;br /&gt;
This means that MobaXterm only works when you have an internet connection and is slow also because part of it is loaded from OneDrive.&lt;br /&gt;
&lt;br /&gt;
The first time MobaXterm starts takes a while - get coffee - be patient.&amp;lt;br&amp;gt;&lt;br /&gt;
If MobaXterm does not want run properly - it is likely NOT an installation problem and re-installing won&#039;t help. Instead delete the &amp;quot;MobaXterm&amp;quot; folder in your standard &amp;quot;Documents&amp;quot; folder. This is where files and settings are stored.&lt;br /&gt;
&lt;br /&gt;
=== Virtual Box, Visualizing Linux, not recommended for the course ===&lt;br /&gt;
VirtualBox from Oracle is a wonderful tool. It installs a package that will allow you to run one or more virtual machines on your computer. On these virtual machines you can install any OS you want, see Linux or Windows. Examples: You have a Mac, you want to run Windows - use VirtualBox and you can run Windows in a Mac application window. You run Windows, but would like to run linux for some specific purpose - same answer.&lt;br /&gt;
If you don&#039;t need your virtual machine (VM) anymore - throw it away and release the disk space for some other purpose. A virtual machine does not need much disk space (5-10 GB), since it can access the disk on the native machine. You can simply share files between your host machine and virtual machine. You can even copy/paste between them, once you have installed the VBoxGuest additions. There is approximately a 10% performance loss when running virtual, but it is worth it for the ease of use. There are other free virtualizing softwares, like VMware Player (one of the first softwares on that market and still very strong), but VirtualBox has proved itself to be small package that is very easy to use and install. No support from CBS will be available on anything but VirtualBox.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://www.virtualbox.org/wiki/Downloads Download VirtualBox for Windows, Mac and Linux versions].&amp;lt;br&amp;gt;&lt;br /&gt;
Installation: Do a standard install. There will be several warnings from Windows about using drivers that has not gone through a Microsoft approval step - these can safely (and must) be ignored - just click Continue. When creating a new virtual machine, you must first decide what to install as a guest OS. The recommended choice is 32 bit Ubuntu linux. In any case you should probably go for 32 bit OSes. Secondly, you must decide how large a disk you should use - the default 8 GB is fine. You must also decide how much memory you should allocate to the VM; if you only have 2 GB RAM on your &amp;quot;real&amp;quot; machine, then allocate 768 MB, if you have more real RAM then allocate 1024-1536 MB. Before you launch your new WM, you must insert the installation image for the OS (Ubuntu) you downloaded into the VM&#039;s CD-rom drive. This is done under &#039;Storage&#039; for the VM - it can be a bit tricky to find the small icon for the CD-drive, but when the standard &amp;quot;choose file&amp;quot; menu opens, then you hit it right.&lt;br /&gt;
After installation of your virtual OS, you must also install the &#039;Guest additions&#039;. These can be found under &#039;Devices&#039; when your guest OS is running. It will give you much better screen control (resizing), faster screen updates, the ability to cut/paste text and share folders between your host OS and guest OS.&lt;br /&gt;
&lt;br /&gt;
== Mac ==&lt;br /&gt;
A Mac has a BSD Unix underneath all the fancy graphics. This means you are mostly ready for the course once you figure out how to use it.&lt;br /&gt;
[https://www.youtube.com/watch?v=8OFD_F5L_vk Here is a very basic video in how to find the terminal], through which we access the Unix operating system. Watch and test on your MAC.&lt;br /&gt;
&lt;br /&gt;
If you took [https://teaching.healthtech.dtu.dk/22101 course 22101/22161] you are done. Otherwise you must install Anaconda.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/22101/index.php/Install_Jupyter_Notebook My guide for windows, but very similar].&amp;lt;br&amp;gt;&lt;br /&gt;
[https://docs.anaconda.com/anaconda/install/mac-os/ Official Anaconda guide].&lt;br /&gt;
&lt;br /&gt;
== Linux, any flavour ==&lt;br /&gt;
You are already set and ready for the course. You should be able to find the terminal; Term, Xterm, Console,&lt;br /&gt;
as this is a basic integrated part of linux.&amp;lt;br&amp;gt;&lt;br /&gt;
There are many editors you can use; gedit, jedit, nedit, emacs, vim and a dozen more, also see later.&amp;lt;br&amp;gt;&lt;br /&gt;
Python is also built-in, but you must install the Anaconda version to get the libraries used in this course if you did not take [https://teaching.healthtech.dtu.dk/22101 course 22101].&amp;lt;br&amp;gt;&lt;br /&gt;
Youtube: [https://www.youtube.com/watch?v=dGm10q_y3xw Installing Anaconda on Ubuntu]&amp;lt;br&amp;gt;&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22101/22101_01-InstallingJupyter.ppt Installing Jupyter/Anaconda] - made for windows but strong similarities&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Editors ==&lt;br /&gt;
It is vitally important that you have a good programming editor on your system. You should REALLY pick one of the 2 top choices.&lt;br /&gt;
&lt;br /&gt;
=== Sublime Text ===&lt;br /&gt;
Extremely popular multi platform editor. Download at [https://www.sublimetext.com/download https://www.sublimetext.com/download].&amp;lt;br&amp;gt;&lt;br /&gt;
[https://www.sublimetext.com/docs/index.html Documentation] and [https://www.youtube.com/c/OdatNurd Youtube tutorials].&lt;br /&gt;
&lt;br /&gt;
=== VScode ===&lt;br /&gt;
Extremely popular multi platform editor. Download at [https://code.visualstudio.com/download https://code.visualstudio.com/download].&amp;lt;br&amp;gt;&lt;br /&gt;
[https://code.visualstudio.com/docs Documentation].&lt;br /&gt;
&lt;br /&gt;
=== Nano/Pico ===&lt;br /&gt;
The &#039;&#039;&#039;nano&#039;&#039;&#039; and the &#039;&#039;&#039;pico&#039;&#039;&#039; editors only work in the Unix terminal window. They are very basic, but fairly intuitive - at least compared to other terminal text editors on Unix. They are good to know and use, if you just want to do something small, or are on a bad/slow network connection. They are very similar and usually only one is installed. To find which one is present, just type:&lt;br /&gt;
 nano myfile.txt&lt;br /&gt;
 pico somefile.txt&lt;br /&gt;
&lt;br /&gt;
=== Other editors ===&lt;br /&gt;
These can be found via Google and downloaded for free.&lt;br /&gt;
* PSpad (Windows)&lt;br /&gt;
* Notepad++ (Windows)&lt;br /&gt;
* Jedit (Multi platform, requires Java)&lt;br /&gt;
* Komodo Edit (Multi platform)&lt;br /&gt;
* Unix GUI editors: gedit, nedit, kate&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Course_preparation&amp;diff=98</id>
		<title>Course preparation</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Course_preparation&amp;diff=98"/>
		<updated>2024-09-02T17:02:24Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* WSL/WSL2, Windows Subsystem for Linux */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Before you meet up on the first day of the course, you will &#039;&#039;&#039;have&#039;&#039;&#039; to be able to start a terminal in a &#039;&#039;&#039;unix&#039;&#039;&#039; environment, and to &#039;&#039;&#039;edit&#039;&#039;&#039; files in the same environment with a text editor.&lt;br /&gt;
Here is how and what you should do on various Operation Systems (OS). Anything mentioned here is free software.&lt;br /&gt;
&lt;br /&gt;
== General remarks about programming environments, IDE&#039;s ==&lt;br /&gt;
Some people have discovered tools like Spyder, Jupyter or PyCharm, which creates an environment for making and running programs. The use of these tools is somewhat discouraged in this course. Part of what the course/teacher wishes to teach is how to &#039;&#039;&#039;confidently and with experience&#039;&#039;&#039; use Unix as a working environment, which means how to work with the Unix file system, the shell, using text editors to write programs and the Unix commands to test/execute them. Using an IDE removes this element of the course.&amp;lt;br&amp;gt;&lt;br /&gt;
Why is Unix as a work environment important?&amp;lt;br&amp;gt;&lt;br /&gt;
* Servers (big computers) do not always offer an IDE.&lt;br /&gt;
* When starting to use big data, the IDE will hamper you - even prevent you from succeeding.&lt;br /&gt;
* Unix is the de-facto work environment in Life Science.&lt;br /&gt;
If you already know how to work with Unix (i.e. years of experience), then you can use an IDE as you please.&lt;br /&gt;
&lt;br /&gt;
== Windows ==&lt;br /&gt;
There are a number of solutions for Windows, but you should really go for the first one described, WSL2. Otherwise you are on your own.&lt;br /&gt;
&lt;br /&gt;
=== WSL/WSL2, Windows Subsystem for Linux ===&lt;br /&gt;
You should really choose to install WSL2 using [https://pureinfotech.com/install-windows-subsystem-linux-2-windows-10/ this guide] ([https://www.youtube.com/watch?v=n-J9438Mv-s video guide]) for a better experience. If you already have WSL (the old version), then it will work, too.&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have made a rather comprehensive guide for WSL2:&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/22113_01-WSL2Install.ppt Guide for WSL2 and Linux]&lt;br /&gt;
&lt;br /&gt;
Now install the Anaconda Python in your WSL2. The WSL2 linux you installed already has a Python, but we need some more libraries later in the course.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/22113_01-AnacondaInstall.ppt Anaconda install] for WSL2 and Linux&lt;br /&gt;
&lt;br /&gt;
The rest here is just other guides, which may help you if stuff does not work out for you.&lt;br /&gt;
&lt;br /&gt;
[https://learn.microsoft.com/en-us/windows/wsl/install Official Windows guide to installing WSL2]. [https://www.omgubuntu.co.uk/how-to-install-wsl2-on-windows-10 Another guide].&amp;lt;br&amp;gt;&lt;br /&gt;
I have seen some update on newer systems. If in trouble, [https://learn.microsoft.com/en-us/windows/wsl/install-manual See here.]&lt;br /&gt;
&lt;br /&gt;
When successfully done, follow these two guides, to make life convenient (first one is mandatory).&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=88cbc7c3-846c-42b3-a14e-af270126ce25 Sharing files between Ubuntu (WSL2) and Windows]&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=dacb6779-7dfb-430a-a9e6-af270126ac74 Making X11 (Linux graphics) work on WSL2]&lt;br /&gt;
&lt;br /&gt;
=== MobaXterm, not really recommended ===&lt;br /&gt;
Download [http://mobaxterm.mobatek.net/download-home-edition.html MobaXterm]. Warning: You are likely to get problem with Python and/or libraries.&amp;lt;br&amp;gt;&lt;br /&gt;
Install MobaXterm - it will put itself in the windows&lt;br /&gt;
program folder. This location differs depending on the version of windows you have, but it is &amp;quot;findable&amp;quot;.&lt;br /&gt;
Some places would be: &#039;&#039;Computer -&amp;gt; C: -&amp;gt; Programs -&amp;gt; Mobatek -&amp;gt; MobaXterm Home Edition&#039;&#039;. Instead of&lt;br /&gt;
&#039;&#039;Programs&#039;&#039; it could be; &#039;&#039;Programmer&#039;&#039;, &#039;&#039;Programs (x86)&#039;&#039; or other.&lt;br /&gt;
The point of knowing this location is that you should drag and drop (move)&lt;br /&gt;
the plugins you downloaded to this folder from your download folder. That act will make the plugin available.&lt;br /&gt;
Windows will ask if you want to do this - yes, you do. Windows Firewall will block some features of MobaXterm&lt;br /&gt;
and ask if they should be allowed. Just block - it does not matter for what we do.&lt;br /&gt;
&lt;br /&gt;
A problem with MobaXterm is that part of the installation is put in a folder that is on OneDrive, if you have OneDrive enabled.&lt;br /&gt;
This means that MobaXterm only works when you have an internet connection and is slow also because part of it is loaded from OneDrive.&lt;br /&gt;
&lt;br /&gt;
The first time MobaXterm starts takes a while - get coffee - be patient.&amp;lt;br&amp;gt;&lt;br /&gt;
If MobaXterm does not want run properly - it is likely NOT an installation problem and re-installing won&#039;t help. Instead delete the &amp;quot;MobaXterm&amp;quot; folder in your standard &amp;quot;Documents&amp;quot; folder. This is where files and settings are stored.&lt;br /&gt;
&lt;br /&gt;
=== Virtual Box, Visualizing Linux, not recommended for the course ===&lt;br /&gt;
VirtualBox from Oracle is a wonderful tool. It installs a package that will allow you to run one or more virtual machines on your computer. On these virtual machines you can install any OS you want, see Linux or Windows. Examples: You have a Mac, you want to run Windows - use VirtualBox and you can run Windows in a Mac application window. You run Windows, but would like to run linux for some specific purpose - same answer.&lt;br /&gt;
If you don&#039;t need your virtual machine (VM) anymore - throw it away and release the disk space for some other purpose. A virtual machine does not need much disk space (5-10 GB), since it can access the disk on the native machine. You can simply share files between your host machine and virtual machine. You can even copy/paste between them, once you have installed the VBoxGuest additions. There is approximately a 10% performance loss when running virtual, but it is worth it for the ease of use. There are other free virtualizing softwares, like VMware Player (one of the first softwares on that market and still very strong), but VirtualBox has proved itself to be small package that is very easy to use and install. No support from CBS will be available on anything but VirtualBox.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://www.virtualbox.org/wiki/Downloads Download VirtualBox for Windows, Mac and Linux versions].&amp;lt;br&amp;gt;&lt;br /&gt;
Installation: Do a standard install. There will be several warnings from Windows about using drivers that has not gone through a Microsoft approval step - these can safely (and must) be ignored - just click Continue. When creating a new virtual machine, you must first decide what to install as a guest OS. The recommended choice is 32 bit Ubuntu linux. In any case you should probably go for 32 bit OSes. Secondly, you must decide how large a disk you should use - the default 8 GB is fine. You must also decide how much memory you should allocate to the VM; if you only have 2 GB RAM on your &amp;quot;real&amp;quot; machine, then allocate 768 MB, if you have more real RAM then allocate 1024-1536 MB. Before you launch your new WM, you must insert the installation image for the OS (Ubuntu) you downloaded into the VM&#039;s CD-rom drive. This is done under &#039;Storage&#039; for the VM - it can be a bit tricky to find the small icon for the CD-drive, but when the standard &amp;quot;choose file&amp;quot; menu opens, then you hit it right.&lt;br /&gt;
After installation of your virtual OS, you must also install the &#039;Guest additions&#039;. These can be found under &#039;Devices&#039; when your guest OS is running. It will give you much better screen control (resizing), faster screen updates, the ability to cut/paste text and share folders between your host OS and guest OS.&lt;br /&gt;
&lt;br /&gt;
== Mac ==&lt;br /&gt;
A Mac has a BSD Unix underneath all the fancy graphics. This means you are mostly ready for the course once you figure out how to use it.&lt;br /&gt;
[https://www.youtube.com/watch?v=8OFD_F5L_vk Here is a very basic video in how to find the terminal], through which we access the Unix operating system. Watch and test on your MAC.&lt;br /&gt;
&lt;br /&gt;
If you took [https://teaching.healthtech.dtu.dk/22101 course 22101/22161] you are done. Otherwise you must install Anaconda.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/22101/index.php/Install_Jupyter_Notebook My guide for windows, but very similar].&amp;lt;br&amp;gt;&lt;br /&gt;
[https://docs.anaconda.com/anaconda/install/mac-os/ Official Anaconda guide].&lt;br /&gt;
&lt;br /&gt;
== Linux, any flavour ==&lt;br /&gt;
You are already set and ready for the course. You should be able to find the terminal; Term, Xterm, Console,&lt;br /&gt;
as this is a basic integrated part of linux.&amp;lt;br&amp;gt;&lt;br /&gt;
There are many editors you can use; gedit, jedit, nedit, emacs, vim and a dozen more, also see later.&amp;lt;br&amp;gt;&lt;br /&gt;
Python is also built-in, but you must install the Anaconda version to get the libraries used in this course if you did not take [https://teaching.healthtech.dtu.dk/22101 course 22101].&amp;lt;br&amp;gt;&lt;br /&gt;
Youtube: [https://www.youtube.com/watch?v=dGm10q_y3xw Installing Anaconda on Ubuntu]&amp;lt;br&amp;gt;&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22101/22101_01-InstallingJupyter.ppt Installing Jupyter/Anaconda] - made for windows but strong similarities&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Editors ==&lt;br /&gt;
It is vitally important that you have a good programming editor on your system. You should REALLY pick one of the 2 top choices.&lt;br /&gt;
&lt;br /&gt;
=== Sublime Text ===&lt;br /&gt;
Extremely popular multi platform editor. Download at [https://www.sublimetext.com/download https://www.sublimetext.com/download].&amp;lt;br&amp;gt;&lt;br /&gt;
[https://www.sublimetext.com/docs/index.html Documentation] and [https://www.youtube.com/c/OdatNurd Youtube tutorials].&lt;br /&gt;
&lt;br /&gt;
=== VScode ===&lt;br /&gt;
Extremely popular multi platform editor. Download at [https://code.visualstudio.com/download https://code.visualstudio.com/download].&amp;lt;br&amp;gt;&lt;br /&gt;
[https://code.visualstudio.com/docs Documentation].&lt;br /&gt;
&lt;br /&gt;
=== Nano/Pico ===&lt;br /&gt;
The &#039;&#039;&#039;nano&#039;&#039;&#039; and the &#039;&#039;&#039;pico&#039;&#039;&#039; editors only work in the Unix terminal window. They are very basic, but fairly intuitive - at least compared to other terminal text editors on Unix. They are good to know and use, if you just want to do something small, or are on a bad/slow network connection. They are very similar and usually only one is installed. To find which one is present, just type:&lt;br /&gt;
 nano myfile.txt&lt;br /&gt;
 pico somefile.txt&lt;br /&gt;
&lt;br /&gt;
=== Other editors ===&lt;br /&gt;
These can be found via Google and downloaded for free.&lt;br /&gt;
* PSpad (Windows)&lt;br /&gt;
* Notepad++ (Windows)&lt;br /&gt;
* Jedit (Multi platform, requires Java)&lt;br /&gt;
* Komodo Edit (Multi platform)&lt;br /&gt;
* Unix GUI editors: gedit, nedit, kate&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Course_preparation&amp;diff=97</id>
		<title>Course preparation</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Course_preparation&amp;diff=97"/>
		<updated>2024-09-02T17:02:11Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* WSL/WSL2, Windows Subsystem for Linux */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Before you meet up on the first day of the course, you will &#039;&#039;&#039;have&#039;&#039;&#039; to be able to start a terminal in a &#039;&#039;&#039;unix&#039;&#039;&#039; environment, and to &#039;&#039;&#039;edit&#039;&#039;&#039; files in the same environment with a text editor.&lt;br /&gt;
Here is how and what you should do on various Operation Systems (OS). Anything mentioned here is free software.&lt;br /&gt;
&lt;br /&gt;
== General remarks about programming environments, IDE&#039;s ==&lt;br /&gt;
Some people have discovered tools like Spyder, Jupyter or PyCharm, which creates an environment for making and running programs. The use of these tools is somewhat discouraged in this course. Part of what the course/teacher wishes to teach is how to &#039;&#039;&#039;confidently and with experience&#039;&#039;&#039; use Unix as a working environment, which means how to work with the Unix file system, the shell, using text editors to write programs and the Unix commands to test/execute them. Using an IDE removes this element of the course.&amp;lt;br&amp;gt;&lt;br /&gt;
Why is Unix as a work environment important?&amp;lt;br&amp;gt;&lt;br /&gt;
* Servers (big computers) do not always offer an IDE.&lt;br /&gt;
* When starting to use big data, the IDE will hamper you - even prevent you from succeeding.&lt;br /&gt;
* Unix is the de-facto work environment in Life Science.&lt;br /&gt;
If you already know how to work with Unix (i.e. years of experience), then you can use an IDE as you please.&lt;br /&gt;
&lt;br /&gt;
== Windows ==&lt;br /&gt;
There are a number of solutions for Windows, but you should really go for the first one described, WSL2. Otherwise you are on your own.&lt;br /&gt;
&lt;br /&gt;
=== WSL/WSL2, Windows Subsystem for Linux ===&lt;br /&gt;
You should really choose to install WSL2 using [https://pureinfotech.com/install-windows-subsystem-linux-2-windows-10/ this guide] ([https://www.youtube.com/watch?v=n-J9438Mv-s video guide]) for a better experience. If you already have WSL (the old version), then it will work, too.&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
I have made a rather comprehensive guide for WSL2:&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/22113_01-WSL2Install.ppt Guide for WSL2 and Linux]&lt;br /&gt;
&lt;br /&gt;
Now install the Anaconda Python in your WSL2. The WSL2 linux you installed already has a Python, but we need some more libraries later in the course.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/22113_01-AnacondaInstall.ppt Anaconda install] for WSL2 and Linux&lt;br /&gt;
&lt;br /&gt;
The rest here is just other guide, which may help you if stuff does not work out for you.&lt;br /&gt;
&lt;br /&gt;
[https://learn.microsoft.com/en-us/windows/wsl/install Official Windows guide to installing WSL2]. [https://www.omgubuntu.co.uk/how-to-install-wsl2-on-windows-10 Another guide].&amp;lt;br&amp;gt;&lt;br /&gt;
I have seen some update on newer systems. If in trouble, [https://learn.microsoft.com/en-us/windows/wsl/install-manual See here.]&lt;br /&gt;
&lt;br /&gt;
When successfully done, follow these two guides, to make life convenient (first one is mandatory).&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=88cbc7c3-846c-42b3-a14e-af270126ce25 Sharing files between Ubuntu (WSL2) and Windows]&amp;lt;br&amp;gt;&lt;br /&gt;
Video: [https://panopto.dtu.dk/Panopto/Pages/Viewer.aspx?id=dacb6779-7dfb-430a-a9e6-af270126ac74 Making X11 (Linux graphics) work on WSL2]&lt;br /&gt;
&lt;br /&gt;
=== MobaXterm, not really recommended ===&lt;br /&gt;
Download [http://mobaxterm.mobatek.net/download-home-edition.html MobaXterm]. Warning: You are likely to get problem with Python and/or libraries.&amp;lt;br&amp;gt;&lt;br /&gt;
Install MobaXterm - it will put itself in the windows&lt;br /&gt;
program folder. This location differs depending on the version of windows you have, but it is &amp;quot;findable&amp;quot;.&lt;br /&gt;
Some places would be: &#039;&#039;Computer -&amp;gt; C: -&amp;gt; Programs -&amp;gt; Mobatek -&amp;gt; MobaXterm Home Edition&#039;&#039;. Instead of&lt;br /&gt;
&#039;&#039;Programs&#039;&#039; it could be; &#039;&#039;Programmer&#039;&#039;, &#039;&#039;Programs (x86)&#039;&#039; or other.&lt;br /&gt;
The point of knowing this location is that you should drag and drop (move)&lt;br /&gt;
the plugins you downloaded to this folder from your download folder. That act will make the plugin available.&lt;br /&gt;
Windows will ask if you want to do this - yes, you do. Windows Firewall will block some features of MobaXterm&lt;br /&gt;
and ask if they should be allowed. Just block - it does not matter for what we do.&lt;br /&gt;
&lt;br /&gt;
A problem with MobaXterm is that part of the installation is put in a folder that is on OneDrive, if you have OneDrive enabled.&lt;br /&gt;
This means that MobaXterm only works when you have an internet connection and is slow also because part of it is loaded from OneDrive.&lt;br /&gt;
&lt;br /&gt;
The first time MobaXterm starts takes a while - get coffee - be patient.&amp;lt;br&amp;gt;&lt;br /&gt;
If MobaXterm does not want run properly - it is likely NOT an installation problem and re-installing won&#039;t help. Instead delete the &amp;quot;MobaXterm&amp;quot; folder in your standard &amp;quot;Documents&amp;quot; folder. This is where files and settings are stored.&lt;br /&gt;
&lt;br /&gt;
=== Virtual Box, Visualizing Linux, not recommended for the course ===&lt;br /&gt;
VirtualBox from Oracle is a wonderful tool. It installs a package that will allow you to run one or more virtual machines on your computer. On these virtual machines you can install any OS you want, see Linux or Windows. Examples: You have a Mac, you want to run Windows - use VirtualBox and you can run Windows in a Mac application window. You run Windows, but would like to run linux for some specific purpose - same answer.&lt;br /&gt;
If you don&#039;t need your virtual machine (VM) anymore - throw it away and release the disk space for some other purpose. A virtual machine does not need much disk space (5-10 GB), since it can access the disk on the native machine. You can simply share files between your host machine and virtual machine. You can even copy/paste between them, once you have installed the VBoxGuest additions. There is approximately a 10% performance loss when running virtual, but it is worth it for the ease of use. There are other free virtualizing softwares, like VMware Player (one of the first softwares on that market and still very strong), but VirtualBox has proved itself to be small package that is very easy to use and install. No support from CBS will be available on anything but VirtualBox.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://www.virtualbox.org/wiki/Downloads Download VirtualBox for Windows, Mac and Linux versions].&amp;lt;br&amp;gt;&lt;br /&gt;
Installation: Do a standard install. There will be several warnings from Windows about using drivers that has not gone through a Microsoft approval step - these can safely (and must) be ignored - just click Continue. When creating a new virtual machine, you must first decide what to install as a guest OS. The recommended choice is 32 bit Ubuntu linux. In any case you should probably go for 32 bit OSes. Secondly, you must decide how large a disk you should use - the default 8 GB is fine. You must also decide how much memory you should allocate to the VM; if you only have 2 GB RAM on your &amp;quot;real&amp;quot; machine, then allocate 768 MB, if you have more real RAM then allocate 1024-1536 MB. Before you launch your new WM, you must insert the installation image for the OS (Ubuntu) you downloaded into the VM&#039;s CD-rom drive. This is done under &#039;Storage&#039; for the VM - it can be a bit tricky to find the small icon for the CD-drive, but when the standard &amp;quot;choose file&amp;quot; menu opens, then you hit it right.&lt;br /&gt;
After installation of your virtual OS, you must also install the &#039;Guest additions&#039;. These can be found under &#039;Devices&#039; when your guest OS is running. It will give you much better screen control (resizing), faster screen updates, the ability to cut/paste text and share folders between your host OS and guest OS.&lt;br /&gt;
&lt;br /&gt;
== Mac ==&lt;br /&gt;
A Mac has a BSD Unix underneath all the fancy graphics. This means you are mostly ready for the course once you figure out how to use it.&lt;br /&gt;
[https://www.youtube.com/watch?v=8OFD_F5L_vk Here is a very basic video in how to find the terminal], through which we access the Unix operating system. Watch and test on your MAC.&lt;br /&gt;
&lt;br /&gt;
If you took [https://teaching.healthtech.dtu.dk/22101 course 22101/22161] you are done. Otherwise you must install Anaconda.&amp;lt;br&amp;gt;&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/22101/index.php/Install_Jupyter_Notebook My guide for windows, but very similar].&amp;lt;br&amp;gt;&lt;br /&gt;
[https://docs.anaconda.com/anaconda/install/mac-os/ Official Anaconda guide].&lt;br /&gt;
&lt;br /&gt;
== Linux, any flavour ==&lt;br /&gt;
You are already set and ready for the course. You should be able to find the terminal; Term, Xterm, Console,&lt;br /&gt;
as this is a basic integrated part of linux.&amp;lt;br&amp;gt;&lt;br /&gt;
There are many editors you can use; gedit, jedit, nedit, emacs, vim and a dozen more, also see later.&amp;lt;br&amp;gt;&lt;br /&gt;
Python is also built-in, but you must install the Anaconda version to get the libraries used in this course if you did not take [https://teaching.healthtech.dtu.dk/22101 course 22101].&amp;lt;br&amp;gt;&lt;br /&gt;
Youtube: [https://www.youtube.com/watch?v=dGm10q_y3xw Installing Anaconda on Ubuntu]&amp;lt;br&amp;gt;&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22101/22101_01-InstallingJupyter.ppt Installing Jupyter/Anaconda] - made for windows but strong similarities&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Editors ==&lt;br /&gt;
It is vitally important that you have a good programming editor on your system. You should REALLY pick one of the 2 top choices.&lt;br /&gt;
&lt;br /&gt;
=== Sublime Text ===&lt;br /&gt;
Extremely popular multi platform editor. Download at [https://www.sublimetext.com/download https://www.sublimetext.com/download].&amp;lt;br&amp;gt;&lt;br /&gt;
[https://www.sublimetext.com/docs/index.html Documentation] and [https://www.youtube.com/c/OdatNurd Youtube tutorials].&lt;br /&gt;
&lt;br /&gt;
=== VScode ===&lt;br /&gt;
Extremely popular multi platform editor. Download at [https://code.visualstudio.com/download https://code.visualstudio.com/download].&amp;lt;br&amp;gt;&lt;br /&gt;
[https://code.visualstudio.com/docs Documentation].&lt;br /&gt;
&lt;br /&gt;
=== Nano/Pico ===&lt;br /&gt;
The &#039;&#039;&#039;nano&#039;&#039;&#039; and the &#039;&#039;&#039;pico&#039;&#039;&#039; editors only work in the Unix terminal window. They are very basic, but fairly intuitive - at least compared to other terminal text editors on Unix. They are good to know and use, if you just want to do something small, or are on a bad/slow network connection. They are very similar and usually only one is installed. To find which one is present, just type:&lt;br /&gt;
 nano myfile.txt&lt;br /&gt;
 pico somefile.txt&lt;br /&gt;
&lt;br /&gt;
=== Other editors ===&lt;br /&gt;
These can be found via Google and downloaded for free.&lt;br /&gt;
* PSpad (Windows)&lt;br /&gt;
* Notepad++ (Windows)&lt;br /&gt;
* Jedit (Multi platform, requires Java)&lt;br /&gt;
* Komodo Edit (Multi platform)&lt;br /&gt;
* Unix GUI editors: gedit, nedit, kate&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Resistance_to_antibiotics&amp;diff=96</id>
		<title>Resistance to antibiotics</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Resistance_to_antibiotics&amp;diff=96"/>
		<updated>2024-05-15T14:34:38Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Optimization */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
Bacterial resistance to antibiotics is a growing problem, and there is many real life scenarios where it is important to understand what type antibiotics will be effective on a patient or what kind of resistance can be expected in certain bacterial populations - or even what kind of antibiotics has the farmer been feeding his pigs.&amp;lt;br&amp;gt;&lt;br /&gt;
Given a &amp;quot;database&amp;quot; of bacterial resistance genes as a fasta file and a FASTQ file where a metagenomics sample has been sequenced from a stool/blood/sewage/pig feces/whatever you must identify which resistance genes are in the sample.&lt;br /&gt;
It must be understood that the sample is a mix of many different bacteria with a greater or lesser amount of the bacterial species. You do NOT sample a single bacteria.&lt;br /&gt;
&lt;br /&gt;
===Method===&lt;br /&gt;
[[File:ngs-coverage-depth.jpg|frame|Coverage and Depth]]&lt;br /&gt;
The resistance genes are &amp;quot;just&amp;quot; dna sequences and likewise are the reads in the FASTQ file. A k-mer is defined as a piece of sequence of k size. Normally k is set to a certain size depending on the problem. By making a database/data structure of all k-mers of the resistance genes, then one can cut out a k-mer from a read in the sample and see if it matches. In case of a match the read might be from bacterial dna which has a resistance gene.&lt;br /&gt;
&lt;br /&gt;
In the project set the size of the k-mer to 19. That is fairly reasonably as we want the k-mer to be unique and that happens around 17+ for human sized genomes. You can experiment with other sizes if you feel like it.  &lt;br /&gt;
&lt;br /&gt;
There is no way of knowing if the reads come from one strand or the other - so try both.&lt;br /&gt;
&lt;br /&gt;
There are several difficulties in identifying the resistance genes accurately, which must be dealt with. This project must handle the following two.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;1)&#039;&#039;&#039; Some genes are very close in the sequence. It might just be one or two SNP&#039;s difference. How to differentiate between the genes?&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2)&#039;&#039;&#039; NGS is notoriously unreliable. You can not be sure that the read is actually correct. This is also why the longer k-mer you use, the greater chance of it being incorrect. How to know that you can trust that you have found a resistance gene in the sample?&lt;br /&gt;
&lt;br /&gt;
The answers to both problems are in the sampling. Every base in the sample has been sequenced to at depth of around 50, i.e. sampled 50 times. Since the genomic material is fairly abundant and then cut randomly and sequenced then the reads are (mostly) overlapping. So when you check the k-mers of the sample against the k-mers in the resistance database then the entire gene must have been covered, thus solving 1). If this coverage is also at a good depth, then 2) is solved as it is not a match by random mis-sequencing. K-mers which only occur few times are due to sequencing errors.&lt;br /&gt;
&lt;br /&gt;
The sequence depth can vary, as you get what you pay for - more depth is more expensive, but within that, the depth also vary depending on the gene being on the genome, a low-copy plasmid or high-copy plasmid. It is highly desired to have a high depth, as that makes the results more reliable.&lt;br /&gt;
&lt;br /&gt;
It may happen that a k-mer in a read accidentally matches a k-mer in the resistance gene database and nothing else from the read matches. That is simply a freak accident (or an occurrence with a certain probability) and does not count for anything. The biology is of course that (part of) a read must align to (part of) a resistance gene, however alignment is an other project. We are dealing with k-mers here.&amp;lt;br&amp;gt;&lt;br /&gt;
You are not required to make certain that these freak accidents interfere with your computation, but ... maybe you could consider it as part of your method to handle it in some (maybe partial) way. Remember, NGS is inherently unreliable, and building something big and certain on top of unreliable data is an exercise in futility.&lt;br /&gt;
&lt;br /&gt;
===Optimization===&lt;br /&gt;
Before starting to code, you should consider your method. NGS files are large and any analysis done on them will take time. You must realize that the resistance genes are a very small subset of the DNA represented in a NGS sample. The wall clock running time for a well-considered program is about 5 minutes for the samples used in this project and I have seen students just using half of that. I have solved it in 33 seconds on my i7-8700 CPU running at 3.20GHz. If your program takes 1 hour or more - go back to the thinking box. It is a great idea to test your program with a small excerpt from the FASTQ sample.&lt;br /&gt;
&lt;br /&gt;
===Input and output===&lt;br /&gt;
The program must as input take a fasta file with [https://teaching.healthtech.dtu.dk/material/22113/resistance_genes.fsa resistance genes] and a number of FASTQ files.&lt;br /&gt;
The FASTQ files are gzipped and the program must deal with that using the python &#039;&#039;&#039;gzip&#039;&#039;&#039; library.&amp;lt;br&amp;gt;&lt;br /&gt;
FASTQ: [https://teaching.healthtech.dtu.dk/material/22113/Unknown3_raw_reads_1.txt.gz Sample 1] and [https://teaching.healthtech.dtu.dk/material/22113/Unknown3_raw_reads_2.txt.gz Sample 2]&amp;lt;br&amp;gt;&lt;br /&gt;
The FASTQ files are paired-end reads, this can be ignored and they can be considered to be one big sample. If you want to make something out of them being paired end reads, you are welcome.&lt;br /&gt;
&lt;br /&gt;
The FASTQ files consist of reads like these:&lt;br /&gt;
 @ILLUMINA-3BDE4F_0027:2:1:9216:1663#GCCAAT/1&lt;br /&gt;
 CAGCCCGCCGATGGCTCCCACAAGNTGTATCTCTGTACAGGTGTTATCGGGAGAATACTTGATTGCATTGGAAAGCAGGTTACTGAAAGCACGTCGGAGCA&lt;br /&gt;
 +ILLUMINA-3BDE4F_0027:2:1:9216:1663#GCCAAT/1&lt;br /&gt;
 ZSGDVEMEQHNGGEGGSGJESSWVBO_]_]\DYV\\`SZ_aXaa`ccccccacc[Wcccaaccccacc[cca_cacc_cccaWccc[cccacc[ccb[ca_&lt;br /&gt;
 @ILLUMINA-3BDE4F_0027:2:1:10636:1724#GCCAAT/1&lt;br /&gt;
 TGCACAGGTAGCCCCTACGCCGCGNATGAACGACCGGAAACGCCGTCACATGATGGCGAAACCAGCCGACAAACTCCGCTTCCAGCCGCTTCAACACCGCC&lt;br /&gt;
 +ILLUMINA-3BDE4F_0027:2:1:10636:1724#GCCAAT/1&lt;br /&gt;
 a^DG_DPGGDQDFFFFQFKD\\Y\Bab]ac[NY[acaXa]cc_ccYccccccbccYcccaccac^acbWccccc_cHabb_aaa[[_Y__a^^Iaab[RY_&lt;br /&gt;
Every line which is not sequence can be ignored, and the sequence line is between the line starting with &#039;&#039;&#039;@&#039;&#039;&#039; and the line starting with &#039;&#039;&#039;+&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
The output is the gene names with their resistances, the coverage and the depth of the gene in the samples. Genes which have less than 95% coverage and/or a depth of less than 10 should not be shown. The genes must be sorted after coverage and depth, so the &amp;quot;certain&amp;quot; hits comes first.&amp;lt;br&amp;gt;&lt;br /&gt;
For those genes which seem to be present in the NGS sample, it could be a good idea to make a plot of the depth for each position of the gene. It reveals pretty clearly if the gene is there or not.&amp;lt;br&amp;gt;&lt;br /&gt;
[[File:present.jpg|150px|Gene is present]][[File:absent.jpg|150px|Gene is absent]]&amp;lt;br&amp;gt;&lt;br /&gt;
Optional: Since some genes are just a few SNP&#039;s apart, then a &amp;quot;winner takes all&amp;quot; strategy could be implemented. There is no need to show a gene with less then 100% coverage or small depth, if the homolog has full coverage and good depth, because it means that the homolog is present in the sample and not the other.&lt;br /&gt;
&lt;br /&gt;
===References===&lt;br /&gt;
# [http://en.wikipedia.org/wiki/FASTQ_format FASTQ format]&lt;br /&gt;
# [http://en.wikipedia.org/wiki/Phred_quality_score Phred quality score]&lt;br /&gt;
These genes are in the data - it is a misunderstanding if you think the project is easier because you have the result:&lt;br /&gt;
 aac(6&#039;)Ib-cr_1_DQ303918&lt;br /&gt;
 strA_4_AF321551&lt;br /&gt;
 strB_1_M96392&lt;br /&gt;
 aac(3)-IIa_1_CP023555.1&lt;br /&gt;
 blaCTX-M-15_23_DQ302097&lt;br /&gt;
 blaOXA-1_1_J02967 &lt;br /&gt;
 blaSHV-28_1_HM751101&lt;br /&gt;
 blaTEM-1B_1_JF910132&lt;br /&gt;
 fosA_3_ACWO01000079&lt;br /&gt;
 catB4_1_EU935739 &lt;br /&gt;
 oqxA_1_EU370913&lt;br /&gt;
 oqxB_1_EU370913&lt;br /&gt;
 sul2_2_GQ421466&lt;br /&gt;
 tet(A)_4_AJ517790&lt;br /&gt;
 dfrA14_1_DQ388123&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=95</id>
		<title>Data analysis</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=95"/>
		<updated>2024-05-13T17:27:04Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
This project is about analyzing specific data and answer various questions about it. The data file is a flat file database constructed in year 2000 with various information about people, and can be seen here as&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/people.db people.db].&lt;br /&gt;
The program must read this file ONCE - line by line - not storing the actual lines for future reference, but entering the data in an appropriate data structure of your own devising. The questions are sometimes asking if some distribution is &amp;quot;normal&amp;quot;. &amp;quot;Normal&amp;quot; here does &#039;&#039;&#039;not&#039;&#039;&#039; mean&lt;br /&gt;
fit the bell curve (standard normal distribution). It means reasonable or natural, what you would expect.&amp;lt;br&amp;gt;&lt;br /&gt;
The program must now answer the following questions:&lt;br /&gt;
&lt;br /&gt;
# Is the age and gender distribution normal/sensible in the database? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the men become fathers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time fatherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the women become mothers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time motherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# How many men and women do not have children (in percent)?&lt;br /&gt;
# What is the average age difference between the parents (with a child in common obviously)?&lt;br /&gt;
# How many people has at least one grandparent that is still alive? A person is living if he/she is in the database. State the number both in percent and as a real number.&lt;br /&gt;
# How many has at least one cousin in the data set? What is the average number of cousins based on those who have cousins?&amp;lt;br&amp;gt;Note: This number is historically difficult to compute right, but here are some thoughts to help you out in verifying your count.&amp;lt;br&amp;gt;You have to construct a method for finding cousin pairs. Any cousin pair you identify, can be written as a tuple (cpr1, cpr2) in a list.&amp;lt;br&amp;gt; a) There should be no duplicate tuples in the list - you are not cousins with the same person more than once.&amp;lt;br&amp;gt; b) There should be no tuple with the same cpr on position 1 and 2 - you are not cousins with yourself.&amp;lt;br&amp;gt; c) Because of symmetry, it is expected that for any (cpr1, cpr2) tuple there is a (cpr2, cpr1) tuple - when you are cousins with somebody, somebody is cousins with you. This has natural consequences: Set(cpr1) == Set(cpr2), Sorted_list(cpr1) == Sorted_list(cpr2).&amp;lt;br&amp;gt; d) This list does NOT discover sibling pairs inserted as cousins, however there should be no overlap of this list and a similar list covering sibling pairs.&amp;lt;br&amp;gt; e) The length of the list of cousin tuples is the number of cousin pairs, and the size of the set of cpr&#039;s is the number of people who have cousins.&lt;br /&gt;
# Is the firstborn likely to be male or female?&lt;br /&gt;
# How many men/women (percentage) have children with more than one woman/man?&lt;br /&gt;
# Do tall people marry (or at least get children together)? To answer that, calculate the percentages of tall/tall, tall/normal, tall/short, normal/normal, normal/short, and short/short couples. Decide your own limits for tall, normal and short, and if they are the same for men and women.&lt;br /&gt;
# Do tall parents get tall children?&lt;br /&gt;
# Do fat people marry (or at least get children together)? To answer that, calculate the percentages of fat/fat, fat/normal, fat/slim, normal/normal, normal/slim, and slim/slim couples. Decide your own limits for fat, normal and slim. Calculate the BMI, and let that be the fatness indicator.&lt;br /&gt;
# Using the knowledge of [https://en.wikipedia.org/wiki/Blood_type#ABO_blood_group_system blood group type inheritance], are there any children in the database where you can safely say that at least one of the parents are not the real parent. If such children exists, make a list of them. In the report you must discuss how you determine that the parent(s) of the child are not the &amp;quot;true&amp;quot; parents.&lt;br /&gt;
# Make a list of fathers who can donate blood to their sons. The list must identify the father and the son(s) and their blood type. You must write the length of the list in the report, together with the number of fathers and the number of sons.&lt;br /&gt;
# Make a list of persons who can donate blood to at least one of their grandparents. The list must identify must the person, the grandparent(s) and their blood type. You must write the length of the list in the report, together with the number of grandchildren and the number of grandparents.&lt;br /&gt;
&lt;br /&gt;
All questions has to answered in one run of the program, but not necessarily in that order. You are welcome to answer other interesting questions, that can be posed from the data. Many questions are about distributions and if the distributions are &amp;quot;normal&amp;quot;. The program can calculate the distributions, but the analysis of the result (evaluating normalcy) is to be in the report.&lt;br /&gt;
I will come with an example: &amp;quot;Is the distribution of first-time fatherhood age normal? A yes/no answer is not good enough.&amp;quot; You must at least calculate and print something like:&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    5%&lt;br /&gt;
 21-25:    10%&lt;br /&gt;
 26-30:    40%&lt;br /&gt;
 31-35:    30%&lt;br /&gt;
 36-40:    15%&lt;br /&gt;
 41-45:	  0%&lt;br /&gt;
From that you simply evaluate if it is normal and put it in the report. I think above numbers are rather normal, but below are very strange. If you want to, you can support your opinions with references to [https://www.dst.dk/en/ Statistics Denmark].&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    0%&lt;br /&gt;
 21-25:    0%&lt;br /&gt;
 26-30:    10%&lt;br /&gt;
 31-35:    20%&lt;br /&gt;
 36-40:    50%&lt;br /&gt;
 41-45:	  20%&lt;br /&gt;
&lt;br /&gt;
The problem you should solve in the project is not &amp;quot;how to make good statistics&amp;quot;, but &amp;quot;how to collect the data from the database&amp;quot;. If you feel you can do better statistics than above, you are welcome.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: For the sanity of the questions you should assume/pretend you are doing this analysis primo 2000.&amp;lt;br&amp;gt; &lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The CPR consists of a date part (first 6 numbers as DDMMYY) which is the birthday, and a 4 digit number. There are rules about how CPR should be constructed, and they are not followed since it is illegal to publish CPR numbers. What you need to know is that the date part is a date in 1900-1999, and the last digit is significant; odd - male, even - female.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The data are somewhat randomly constructed, so you can find &#039;facts&#039; that seem very unlikely, like 6 year old kids with a height of 2 m. Just accept it.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: In the database the children of a person is clear. This means you can follow a thread &#039;&#039;down&#039;&#039; the generations. As can be seen from some of the questions, it can nice to find the parents directly from a child, i.e follow a thread &#039;&#039;up&#039;&#039; the generations. Can you find a way to do this easily?&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=94</id>
		<title>Data analysis</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=94"/>
		<updated>2024-05-13T17:26:26Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
This project is about analyzing specific data and answer various questions about it. The data file is a flat file database constructed in year 2000 with various information about people, and can be seen here as&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/people.db people.db].&lt;br /&gt;
The program must read this file ONCE - line by line - not storing the actual lines for future reference, but entering the data in an appropriate data structure of your own devising. The questions are sometimes asking if some distribution is &amp;quot;normal&amp;quot;. &amp;quot;Normal&amp;quot; here does &#039;&#039;&#039;not&#039;&#039;&#039; mean&lt;br /&gt;
fit the bell curve (standard normal distribution). It means reasonable or natural, what you would expect.&amp;lt;br&amp;gt;&lt;br /&gt;
The program must now answer the following questions:&lt;br /&gt;
&lt;br /&gt;
# Is the age and gender distribution normal/sensible in the database? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the men become fathers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time fatherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the women become mothers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time motherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# How many men and women do not have children (in percent)?&lt;br /&gt;
# What is the average age difference between the parents (with a child in common obviously)?&lt;br /&gt;
# How many people has at least one grandparent that is still alive? A person is living if he/she is in the database. State the number both in percent and as a real number.&lt;br /&gt;
# How many has at least one cousin in the data set? What is the average number of cousins based on those who have cousins?&amp;lt;br&amp;gt;Note: This number is historically difficult to compute right, but here are some thoughts to help you out in verifying your count.&amp;lt;br&amp;gt;You have to construct a method for finding cousin pairs. Any cousin pair you identify, can be written as a tuple (cpr1, cpr2) in a list.&amp;lt;br&amp;gt; a) There should be no duplicate tuples in the list - you are not cousins with the same person more than once.&amp;lt;br&amp;gt; b) There should be no tuple with the same cpr on position 1 and 2 - you are not cousins with yourself.&amp;lt;br&amp;gt; c) Because of symmetry, it is expected that for any (cpr1, cpr2) tuple there is a (cpr2, cpr1) tuple - when you are cousins with somebody, somebody is cousins with you. This has natural consequences: Set(cpr1) == Set(cpr2), Sorted_list(cpr1) == Sorted_list(cpr2).&amp;lt;br&amp;gt; d) This list does NOT discover brother/sister pairs inserted as cousins, however there should be no overlap of this list and a similar list covering sibling pairs.&amp;lt;br&amp;gt; e) The length of the list of cousin tuples is the number of cousin pairs, and the size of the set of cpr&#039;s is the number of people who have cousins.&lt;br /&gt;
# Is the firstborn likely to be male or female?&lt;br /&gt;
# How many men/women (percentage) have children with more than one woman/man?&lt;br /&gt;
# Do tall people marry (or at least get children together)? To answer that, calculate the percentages of tall/tall, tall/normal, tall/short, normal/normal, normal/short, and short/short couples. Decide your own limits for tall, normal and short, and if they are the same for men and women.&lt;br /&gt;
# Do tall parents get tall children?&lt;br /&gt;
# Do fat people marry (or at least get children together)? To answer that, calculate the percentages of fat/fat, fat/normal, fat/slim, normal/normal, normal/slim, and slim/slim couples. Decide your own limits for fat, normal and slim. Calculate the BMI, and let that be the fatness indicator.&lt;br /&gt;
# Using the knowledge of [https://en.wikipedia.org/wiki/Blood_type#ABO_blood_group_system blood group type inheritance], are there any children in the database where you can safely say that at least one of the parents are not the real parent. If such children exists, make a list of them. In the report you must discuss how you determine that the parent(s) of the child are not the &amp;quot;true&amp;quot; parents.&lt;br /&gt;
# Make a list of fathers who can donate blood to their sons. The list must identify the father and the son(s) and their blood type. You must write the length of the list in the report, together with the number of fathers and the number of sons.&lt;br /&gt;
# Make a list of persons who can donate blood to at least one of their grandparents. The list must identify must the person, the grandparent(s) and their blood type. You must write the length of the list in the report, together with the number of grandchildren and the number of grandparents.&lt;br /&gt;
&lt;br /&gt;
All questions has to answered in one run of the program, but not necessarily in that order. You are welcome to answer other interesting questions, that can be posed from the data. Many questions are about distributions and if the distributions are &amp;quot;normal&amp;quot;. The program can calculate the distributions, but the analysis of the result (evaluating normalcy) is to be in the report.&lt;br /&gt;
I will come with an example: &amp;quot;Is the distribution of first-time fatherhood age normal? A yes/no answer is not good enough.&amp;quot; You must at least calculate and print something like:&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    5%&lt;br /&gt;
 21-25:    10%&lt;br /&gt;
 26-30:    40%&lt;br /&gt;
 31-35:    30%&lt;br /&gt;
 36-40:    15%&lt;br /&gt;
 41-45:	  0%&lt;br /&gt;
From that you simply evaluate if it is normal and put it in the report. I think above numbers are rather normal, but below are very strange. If you want to, you can support your opinions with references to [https://www.dst.dk/en/ Statistics Denmark].&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    0%&lt;br /&gt;
 21-25:    0%&lt;br /&gt;
 26-30:    10%&lt;br /&gt;
 31-35:    20%&lt;br /&gt;
 36-40:    50%&lt;br /&gt;
 41-45:	  20%&lt;br /&gt;
&lt;br /&gt;
The problem you should solve in the project is not &amp;quot;how to make good statistics&amp;quot;, but &amp;quot;how to collect the data from the database&amp;quot;. If you feel you can do better statistics than above, you are welcome.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: For the sanity of the questions you should assume/pretend you are doing this analysis primo 2000.&amp;lt;br&amp;gt; &lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The CPR consists of a date part (first 6 numbers as DDMMYY) which is the birthday, and a 4 digit number. There are rules about how CPR should be constructed, and they are not followed since it is illegal to publish CPR numbers. What you need to know is that the date part is a date in 1900-1999, and the last digit is significant; odd - male, even - female.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The data are somewhat randomly constructed, so you can find &#039;facts&#039; that seem very unlikely, like 6 year old kids with a height of 2 m. Just accept it.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: In the database the children of a person is clear. This means you can follow a thread &#039;&#039;down&#039;&#039; the generations. As can be seen from some of the questions, it can nice to find the parents directly from a child, i.e follow a thread &#039;&#039;up&#039;&#039; the generations. Can you find a way to do this easily?&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=93</id>
		<title>Data analysis</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=93"/>
		<updated>2024-05-13T17:25:58Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
This project is about analyzing specific data and answer various questions about it. The data file is a flat file database constructed in year 2000 with various information about people, and can be seen here as&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/people.db people.db].&lt;br /&gt;
The program must read this file ONCE - line by line - not storing the actual lines for future reference, but entering the data in an appropriate data structure of your own devising. The questions are sometimes asking if some distribution is &amp;quot;normal&amp;quot;. &amp;quot;Normal&amp;quot; here does &#039;&#039;&#039;not&#039;&#039;&#039; mean&lt;br /&gt;
fit the bell curve (standard normal distribution). It means reasonable or natural, what you would expect.&amp;lt;br&amp;gt;&lt;br /&gt;
The program must now answer the following questions:&lt;br /&gt;
&lt;br /&gt;
# Is the age and gender distribution normal/sensible in the database? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the men become fathers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time fatherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the women become mothers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time motherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# How many men and women do not have children (in percent)?&lt;br /&gt;
# What is the average age difference between the parents (with a child in common obviously)?&lt;br /&gt;
# How many people has at least one grandparent that is still alive? A person is living if he/she is in the database. State the number both in percent and as a real number.&lt;br /&gt;
# How many has at least one cousin in the data set? What is the average number of cousins based on those who have cousins?&amp;lt;br&amp;gt;Note: This number is historically difficult to compute right, but here are some thoughts to help you out in verifying your count.&amp;lt;br&amp;gt;You have to construct a method for finding cousin pairs. Any cousin pair you identify, can be written as a tuple (cpr1, cpr2) in a list.&amp;lt;br&amp;gt; a) There should be no duplicate tuples in the list - you are not cousins with the same person more than once.&amp;lt;br&amp;gt; b) There should be no tuple with the same cpr on position 1 and 2 - you are not cousins with yourself.&amp;lt;br&amp;gt; c) Because of symmetry, it is expected that for any (cpr1, cpr2) tuple there is a (cpr2, cpr1) tuple - when you are cousins with somebody, somebody is cousins with you. This has natural consequences: Set(cpr1) == Set(cpr2), Sorted_list(cpr1) == Sorted_list(cpr2).&amp;lt;br&amp;gt; d) This list does NOT discover brother/sister pairs inserted as cousins, however there should be no overlap of this list and a similar list covering sibling pairs. e) The length of the list of cousin tuples is the number of cousin pairs, and the size of the set of cpr&#039;s is the number of people who have cousins.&lt;br /&gt;
# Is the firstborn likely to be male or female?&lt;br /&gt;
# How many men/women (percentage) have children with more than one woman/man?&lt;br /&gt;
# Do tall people marry (or at least get children together)? To answer that, calculate the percentages of tall/tall, tall/normal, tall/short, normal/normal, normal/short, and short/short couples. Decide your own limits for tall, normal and short, and if they are the same for men and women.&lt;br /&gt;
# Do tall parents get tall children?&lt;br /&gt;
# Do fat people marry (or at least get children together)? To answer that, calculate the percentages of fat/fat, fat/normal, fat/slim, normal/normal, normal/slim, and slim/slim couples. Decide your own limits for fat, normal and slim. Calculate the BMI, and let that be the fatness indicator.&lt;br /&gt;
# Using the knowledge of [https://en.wikipedia.org/wiki/Blood_type#ABO_blood_group_system blood group type inheritance], are there any children in the database where you can safely say that at least one of the parents are not the real parent. If such children exists, make a list of them. In the report you must discuss how you determine that the parent(s) of the child are not the &amp;quot;true&amp;quot; parents.&lt;br /&gt;
# Make a list of fathers who can donate blood to their sons. The list must identify the father and the son(s) and their blood type. You must write the length of the list in the report, together with the number of fathers and the number of sons.&lt;br /&gt;
# Make a list of persons who can donate blood to at least one of their grandparents. The list must identify must the person, the grandparent(s) and their blood type. You must write the length of the list in the report, together with the number of grandchildren and the number of grandparents.&lt;br /&gt;
&lt;br /&gt;
All questions has to answered in one run of the program, but not necessarily in that order. You are welcome to answer other interesting questions, that can be posed from the data. Many questions are about distributions and if the distributions are &amp;quot;normal&amp;quot;. The program can calculate the distributions, but the analysis of the result (evaluating normalcy) is to be in the report.&lt;br /&gt;
I will come with an example: &amp;quot;Is the distribution of first-time fatherhood age normal? A yes/no answer is not good enough.&amp;quot; You must at least calculate and print something like:&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    5%&lt;br /&gt;
 21-25:    10%&lt;br /&gt;
 26-30:    40%&lt;br /&gt;
 31-35:    30%&lt;br /&gt;
 36-40:    15%&lt;br /&gt;
 41-45:	  0%&lt;br /&gt;
From that you simply evaluate if it is normal and put it in the report. I think above numbers are rather normal, but below are very strange. If you want to, you can support your opinions with references to [https://www.dst.dk/en/ Statistics Denmark].&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    0%&lt;br /&gt;
 21-25:    0%&lt;br /&gt;
 26-30:    10%&lt;br /&gt;
 31-35:    20%&lt;br /&gt;
 36-40:    50%&lt;br /&gt;
 41-45:	  20%&lt;br /&gt;
&lt;br /&gt;
The problem you should solve in the project is not &amp;quot;how to make good statistics&amp;quot;, but &amp;quot;how to collect the data from the database&amp;quot;. If you feel you can do better statistics than above, you are welcome.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: For the sanity of the questions you should assume/pretend you are doing this analysis primo 2000.&amp;lt;br&amp;gt; &lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The CPR consists of a date part (first 6 numbers as DDMMYY) which is the birthday, and a 4 digit number. There are rules about how CPR should be constructed, and they are not followed since it is illegal to publish CPR numbers. What you need to know is that the date part is a date in 1900-1999, and the last digit is significant; odd - male, even - female.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The data are somewhat randomly constructed, so you can find &#039;facts&#039; that seem very unlikely, like 6 year old kids with a height of 2 m. Just accept it.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: In the database the children of a person is clear. This means you can follow a thread &#039;&#039;down&#039;&#039; the generations. As can be seen from some of the questions, it can nice to find the parents directly from a child, i.e follow a thread &#039;&#039;up&#039;&#039; the generations. Can you find a way to do this easily?&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=92</id>
		<title>Data analysis</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=92"/>
		<updated>2024-05-13T16:02:26Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
This project is about analyzing specific data and answer various questions about it. The data file is a flat file database constructed in year 2000 with various information about people, and can be seen here as&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/people.db people.db].&lt;br /&gt;
The program must read this file ONCE - line by line - not storing the actual lines for future reference, but entering the data in an appropriate data structure of your own devising. The questions are sometimes asking if some distribution is &amp;quot;normal&amp;quot;. &amp;quot;Normal&amp;quot; here does &#039;&#039;&#039;not&#039;&#039;&#039; mean&lt;br /&gt;
fit the bell curve (standard normal distribution). It means reasonable or natural, what you would expect.&amp;lt;br&amp;gt;&lt;br /&gt;
The program must now answer the following questions:&lt;br /&gt;
&lt;br /&gt;
# Is the age and gender distribution normal/sensible in the database? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the men become fathers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time fatherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the women become mothers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time motherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# How many men and women do not have children (in percent)?&lt;br /&gt;
# What is the average age difference between the parents (with a child in common obviously)?&lt;br /&gt;
# How many people has at least one grandparent that is still alive? A person is living if he/she is in the database. State the number both in percent and as a real number.&lt;br /&gt;
# How many has at least one cousin in the data set? What is the average number of cousins based on those who have cousins?&amp;lt;br&amp;gt;Note: This number is historically difficult to compute right, but here are some thoughts to help you out in verifying your count.&amp;lt;br&amp;gt;You have to construct a method for finding cousin pairs. Any cousin pair you identify, can be written as a tuple (cpr1, cpr2) in a list.&amp;lt;br&amp;gt; a) There should be no duplicate tuples in the list - you are not cousins with the same person more than once.&amp;lt;br&amp;gt; b) There should be no tuple with the same cpr on position 1 and 2 - you are not cousins with yourself.&amp;lt;br&amp;gt; c) Because of symmetry, it is expected that for any (cpr1, cpr2) tuple there is a (cpr2, cpr1) tuple - when you are cousins with somebody, somebody is cousins with you. This has natural consequences: Set(cpr1) == Set(cpr2), Sorted_list(cpr1) == Sorted_list(cpr2).&amp;lt;br&amp;gt; d) The length of the list of cousin tuples is the number of cousin pairs, and the size of the set of cpr&#039;s is the number of people who have cousins.&lt;br /&gt;
# Is the firstborn likely to be male or female?&lt;br /&gt;
# How many men/women (percentage) have children with more than one woman/man?&lt;br /&gt;
# Do tall people marry (or at least get children together)? To answer that, calculate the percentages of tall/tall, tall/normal, tall/short, normal/normal, normal/short, and short/short couples. Decide your own limits for tall, normal and short, and if they are the same for men and women.&lt;br /&gt;
# Do tall parents get tall children?&lt;br /&gt;
# Do fat people marry (or at least get children together)? To answer that, calculate the percentages of fat/fat, fat/normal, fat/slim, normal/normal, normal/slim, and slim/slim couples. Decide your own limits for fat, normal and slim. Calculate the BMI, and let that be the fatness indicator.&lt;br /&gt;
# Using the knowledge of [https://en.wikipedia.org/wiki/Blood_type#ABO_blood_group_system blood group type inheritance], are there any children in the database where you can safely say that at least one of the parents are not the real parent. If such children exists, make a list of them. In the report you must discuss how you determine that the parent(s) of the child are not the &amp;quot;true&amp;quot; parents.&lt;br /&gt;
# Make a list of fathers who can donate blood to their sons. The list must identify the father and the son(s) and their blood type. You must write the length of the list in the report, together with the number of fathers and the number of sons.&lt;br /&gt;
# Make a list of persons who can donate blood to at least one of their grandparents. The list must identify must the person, the grandparent(s) and their blood type. You must write the length of the list in the report, together with the number of grandchildren and the number of grandparents.&lt;br /&gt;
&lt;br /&gt;
All questions has to answered in one run of the program, but not necessarily in that order. You are welcome to answer other interesting questions, that can be posed from the data. Many questions are about distributions and if the distributions are &amp;quot;normal&amp;quot;. The program can calculate the distributions, but the analysis of the result (evaluating normalcy) is to be in the report.&lt;br /&gt;
I will come with an example: &amp;quot;Is the distribution of first-time fatherhood age normal? A yes/no answer is not good enough.&amp;quot; You must at least calculate and print something like:&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    5%&lt;br /&gt;
 21-25:    10%&lt;br /&gt;
 26-30:    40%&lt;br /&gt;
 31-35:    30%&lt;br /&gt;
 36-40:    15%&lt;br /&gt;
 41-45:	  0%&lt;br /&gt;
From that you simply evaluate if it is normal and put it in the report. I think above numbers are rather normal, but below are very strange. If you want to, you can support your opinions with references to [https://www.dst.dk/en/ Statistics Denmark].&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    0%&lt;br /&gt;
 21-25:    0%&lt;br /&gt;
 26-30:    10%&lt;br /&gt;
 31-35:    20%&lt;br /&gt;
 36-40:    50%&lt;br /&gt;
 41-45:	  20%&lt;br /&gt;
&lt;br /&gt;
The problem you should solve in the project is not &amp;quot;how to make good statistics&amp;quot;, but &amp;quot;how to collect the data from the database&amp;quot;. If you feel you can do better statistics than above, you are welcome.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: For the sanity of the questions you should assume/pretend you are doing this analysis primo 2000.&amp;lt;br&amp;gt; &lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The CPR consists of a date part (first 6 numbers as DDMMYY) which is the birthday, and a 4 digit number. There are rules about how CPR should be constructed, and they are not followed since it is illegal to publish CPR numbers. What you need to know is that the date part is a date in 1900-1999, and the last digit is significant; odd - male, even - female.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The data are somewhat randomly constructed, so you can find &#039;facts&#039; that seem very unlikely, like 6 year old kids with a height of 2 m. Just accept it.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: In the database the children of a person is clear. This means you can follow a thread &#039;&#039;down&#039;&#039; the generations. As can be seen from some of the questions, it can nice to find the parents directly from a child, i.e follow a thread &#039;&#039;up&#039;&#039; the generations. Can you find a way to do this easily?&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=91</id>
		<title>Data analysis</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=91"/>
		<updated>2024-05-13T15:56:07Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
This project is about analyzing specific data and answer various questions about it. The data file is a flat file database constructed in year 2000 with various information about people, and can be seen here as&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/people.db people.db].&lt;br /&gt;
The program must read this file ONCE - line by line - not storing the actual lines for future reference, but entering the data in an appropriate data structure of your own devising. The questions are sometimes asking if some distribution is &amp;quot;normal&amp;quot;. &amp;quot;Normal&amp;quot; here does &#039;&#039;&#039;not&#039;&#039;&#039; mean&lt;br /&gt;
fit the bell curve (standard normal distribution). It means reasonable or natural, what you would expect.&amp;lt;br&amp;gt;&lt;br /&gt;
The program must now answer the following questions:&lt;br /&gt;
&lt;br /&gt;
# Is the age and gender distribution normal/sensible in the database? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the men become fathers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time fatherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the women become mothers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time motherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# How many men and women do not have children (in percent)?&lt;br /&gt;
# What is the average age difference between the parents (with a child in common obviously)?&lt;br /&gt;
# How many people has at least one grandparent that is still alive? A person is living if he/she is in the database. State the number both in percent and as a real number.&lt;br /&gt;
# How many has at least one cousin in the data set? What is the average number of cousins based on those who have cousins?&amp;lt;br&amp;gt;Note: This number is historically difficult to compute right, but here are some thoughts to help you out in verifying your count.&amp;lt;br&amp;gt;You have to construct a method for finding cousin pairs. Any cousin pair you identify, can be written as a tuple (cpr1, cpr2) in a list.&amp;lt;br&amp;gt; a) There should be no duplicate tuples in the list.&amp;lt;br&amp;gt; b) There should be no tuple with the same cpr on position 1 and 2.&amp;lt;br&amp;gt; c) Because of symmetry, it is expected that for any (cpr1, cpr2) tuple there is a (cpr2, cpr1) tuple - which has natural consequences: Set(cpr1) == Set(cpr2), Sorted_list(cpr1) == Sorted_list(cpr2).&amp;lt;br&amp;gt; d) The length of the list of cousin tuples is the number of cousin pairs, and the size of the set of cpr&#039;s is the number of people who have cousins.&lt;br /&gt;
# Is the firstborn likely to be male or female?&lt;br /&gt;
# How many men/women (percentage) have children with more than one woman/man?&lt;br /&gt;
# Do tall people marry (or at least get children together)? To answer that, calculate the percentages of tall/tall, tall/normal, tall/short, normal/normal, normal/short, and short/short couples. Decide your own limits for tall, normal and short, and if they are the same for men and women.&lt;br /&gt;
# Do tall parents get tall children?&lt;br /&gt;
# Do fat people marry (or at least get children together)? To answer that, calculate the percentages of fat/fat, fat/normal, fat/slim, normal/normal, normal/slim, and slim/slim couples. Decide your own limits for fat, normal and slim. Calculate the BMI, and let that be the fatness indicator.&lt;br /&gt;
# Using the knowledge of [https://en.wikipedia.org/wiki/Blood_type#ABO_blood_group_system blood group type inheritance], are there any children in the database where you can safely say that at least one of the parents are not the real parent. If such children exists, make a list of them. In the report you must discuss how you determine that the parent(s) of the child are not the &amp;quot;true&amp;quot; parents.&lt;br /&gt;
# Make a list of fathers who can donate blood to their sons. The list must identify the father and the son(s) and their blood type. You must write the length of the list in the report, together with the number of fathers and the number of sons.&lt;br /&gt;
# Make a list of persons who can donate blood to at least one of their grandparents. The list must identify must the person, the grandparent(s) and their blood type. You must write the length of the list in the report, together with the number of grandchildren and the number of grandparents.&lt;br /&gt;
&lt;br /&gt;
All questions has to answered in one run of the program, but not necessarily in that order. You are welcome to answer other interesting questions, that can be posed from the data. Many questions are about distributions and if the distributions are &amp;quot;normal&amp;quot;. The program can calculate the distributions, but the analysis of the result (evaluating normalcy) is to be in the report.&lt;br /&gt;
I will come with an example: &amp;quot;Is the distribution of first-time fatherhood age normal? A yes/no answer is not good enough.&amp;quot; You must at least calculate and print something like:&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    5%&lt;br /&gt;
 21-25:    10%&lt;br /&gt;
 26-30:    40%&lt;br /&gt;
 31-35:    30%&lt;br /&gt;
 36-40:    15%&lt;br /&gt;
 41-45:	  0%&lt;br /&gt;
From that you simply evaluate if it is normal and put it in the report. I think above numbers are rather normal, but below are very strange. If you want to, you can support your opinions with references to [https://www.dst.dk/en/ Statistics Denmark].&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    0%&lt;br /&gt;
 21-25:    0%&lt;br /&gt;
 26-30:    10%&lt;br /&gt;
 31-35:    20%&lt;br /&gt;
 36-40:    50%&lt;br /&gt;
 41-45:	  20%&lt;br /&gt;
&lt;br /&gt;
The problem you should solve in the project is not &amp;quot;how to make good statistics&amp;quot;, but &amp;quot;how to collect the data from the database&amp;quot;. If you feel you can do better statistics than above, you are welcome.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: For the sanity of the questions you should assume/pretend you are doing this analysis primo 2000.&amp;lt;br&amp;gt; &lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The CPR consists of a date part (first 6 numbers as DDMMYY) which is the birthday, and a 4 digit number. There are rules about how CPR should be constructed, and they are not followed since it is illegal to publish CPR numbers. What you need to know is that the date part is a date in 1900-1999, and the last digit is significant; odd - male, even - female.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The data are somewhat randomly constructed, so you can find &#039;facts&#039; that seem very unlikely, like 6 year old kids with a height of 2 m. Just accept it.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: In the database the children of a person is clear. This means you can follow a thread &#039;&#039;down&#039;&#039; the generations. As can be seen from some of the questions, it can nice to find the parents directly from a child, i.e follow a thread &#039;&#039;up&#039;&#039; the generations. Can you find a way to do this easily?&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=90</id>
		<title>Data analysis</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=90"/>
		<updated>2024-05-13T15:50:37Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
This project is about analyzing specific data and answer various questions about it. The data file is a flat file database constructed in year 2000 with various information about people, and can be seen here as&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/people.db people.db].&lt;br /&gt;
The program must read this file ONCE - line by line - not storing the actual lines for future reference, but entering the data in an appropriate data structure of your own devising. The questions are sometimes asking if some distribution is &amp;quot;normal&amp;quot;. &amp;quot;Normal&amp;quot; here does &#039;&#039;&#039;not&#039;&#039;&#039; mean&lt;br /&gt;
fit the bell curve (standard normal distribution). It means reasonable or natural, what you would expect.&amp;lt;br&amp;gt;&lt;br /&gt;
The program must now answer the following questions:&lt;br /&gt;
&lt;br /&gt;
# Is the age and gender distribution normal/sensible in the database? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the men become fathers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time fatherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the women become mothers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time motherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# How many men and women do not have children (in percent)?&lt;br /&gt;
# What is the average age difference between the parents (with a child in common obviously)?&lt;br /&gt;
# How many people has at least one grandparent that is still alive? A person is living if he/she is in the database. State the number both in percent and as a real number.&lt;br /&gt;
# How many has at least one cousin in the data set? What is the average number of cousins based on those who have cousins?&amp;lt;br&amp;gt;Note: This number is historically difficult to compute right, but here are some thoughts to help you out in verifying your count.&amp;lt;br&amp;gt;You have to construct a method for finding cousin pairs. Any cousin pair you identify, can be written as a tuple (cpr1, cpr2) in a list.&amp;lt;br&amp;gt; a) There should be no duplicate tuples in the list.&amp;lt;br&amp;gt; b) There should be no tuple with the same cpr on position 1 and 2.&amp;lt;br&amp;gt; c) Because of symmetry, it is expected that for any (cpr1, cpr2) tuple there is a (cpr2, cpr1) tuple - which also implies that the set of cpr1&#039;s is equal to the set of cpr2&#039;s.&amp;lt;br&amp;gt; d) The length of the list of cousin tuples is the number of cousin pairs, and the size of the set of cpr&#039;s is the number of people who have cousins.&lt;br /&gt;
# Is the firstborn likely to be male or female?&lt;br /&gt;
# How many men/women (percentage) have children with more than one woman/man?&lt;br /&gt;
# Do tall people marry (or at least get children together)? To answer that, calculate the percentages of tall/tall, tall/normal, tall/short, normal/normal, normal/short, and short/short couples. Decide your own limits for tall, normal and short, and if they are the same for men and women.&lt;br /&gt;
# Do tall parents get tall children?&lt;br /&gt;
# Do fat people marry (or at least get children together)? To answer that, calculate the percentages of fat/fat, fat/normal, fat/slim, normal/normal, normal/slim, and slim/slim couples. Decide your own limits for fat, normal and slim. Calculate the BMI, and let that be the fatness indicator.&lt;br /&gt;
# Using the knowledge of [https://en.wikipedia.org/wiki/Blood_type#ABO_blood_group_system blood group type inheritance], are there any children in the database where you can safely say that at least one of the parents are not the real parent. If such children exists, make a list of them. In the report you must discuss how you determine that the parent(s) of the child are not the &amp;quot;true&amp;quot; parents.&lt;br /&gt;
# Make a list of fathers who can donate blood to their sons. The list must identify the father and the son(s) and their blood type. You must write the length of the list in the report, together with the number of fathers and the number of sons.&lt;br /&gt;
# Make a list of persons who can donate blood to at least one of their grandparents. The list must identify must the person, the grandparent(s) and their blood type. You must write the length of the list in the report, together with the number of grandchildren and the number of grandparents.&lt;br /&gt;
&lt;br /&gt;
All questions has to answered in one run of the program, but not necessarily in that order. You are welcome to answer other interesting questions, that can be posed from the data. Many questions are about distributions and if the distributions are &amp;quot;normal&amp;quot;. The program can calculate the distributions, but the analysis of the result (evaluating normalcy) is to be in the report.&lt;br /&gt;
I will come with an example: &amp;quot;Is the distribution of first-time fatherhood age normal? A yes/no answer is not good enough.&amp;quot; You must at least calculate and print something like:&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    5%&lt;br /&gt;
 21-25:    10%&lt;br /&gt;
 26-30:    40%&lt;br /&gt;
 31-35:    30%&lt;br /&gt;
 36-40:    15%&lt;br /&gt;
 41-45:	  0%&lt;br /&gt;
From that you simply evaluate if it is normal and put it in the report. I think above numbers are rather normal, but below are very strange. If you want to, you can support your opinions with references to [https://www.dst.dk/en/ Statistics Denmark].&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    0%&lt;br /&gt;
 21-25:    0%&lt;br /&gt;
 26-30:    10%&lt;br /&gt;
 31-35:    20%&lt;br /&gt;
 36-40:    50%&lt;br /&gt;
 41-45:	  20%&lt;br /&gt;
&lt;br /&gt;
The problem you should solve in the project is not &amp;quot;how to make good statistics&amp;quot;, but &amp;quot;how to collect the data from the database&amp;quot;. If you feel you can do better statistics than above, you are welcome.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: For the sanity of the questions you should assume/pretend you are doing this analysis primo 2000.&amp;lt;br&amp;gt; &lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The CPR consists of a date part (first 6 numbers as DDMMYY) which is the birthday, and a 4 digit number. There are rules about how CPR should be constructed, and they are not followed since it is illegal to publish CPR numbers. What you need to know is that the date part is a date in 1900-1999, and the last digit is significant; odd - male, even - female.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The data are somewhat randomly constructed, so you can find &#039;facts&#039; that seem very unlikely, like 6 year old kids with a height of 2 m. Just accept it.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: In the database the children of a person is clear. This means you can follow a thread &#039;&#039;down&#039;&#039; the generations. As can be seen from some of the questions, it can nice to find the parents directly from a child, i.e follow a thread &#039;&#039;up&#039;&#039; the generations. Can you find a way to do this easily?&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=89</id>
		<title>Data analysis</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=89"/>
		<updated>2024-05-13T09:04:55Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
This project is about analyzing specific data and answer various questions about it. The data file is a flat file database constructed in year 2000 with various information about people, and can be seen here as&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/people.db people.db].&lt;br /&gt;
The program must read this file ONCE - line by line - not storing the actual lines for future reference, but entering the data in an appropriate data structure of your own devising. The questions are sometimes asking if some distribution is &amp;quot;normal&amp;quot;. &amp;quot;Normal&amp;quot; here does &#039;&#039;&#039;not&#039;&#039;&#039; mean&lt;br /&gt;
fit the bell curve (standard normal distribution). It means reasonable or natural, what you would expect.&amp;lt;br&amp;gt;&lt;br /&gt;
The program must now answer the following questions:&lt;br /&gt;
&lt;br /&gt;
# Is the age and gender distribution normal/sensible in the database? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the men become fathers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time fatherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the women become mothers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time motherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# How many men and women do not have children (in percent)?&lt;br /&gt;
# What is the average age difference between the parents (with a child in common obviously)?&lt;br /&gt;
# How many people has at least one grandparent that is still alive? A person is living if he/she is in the database. State the number both in percent and as a real number.&lt;br /&gt;
# How many has at least one cousin in the data set? What is the average number of cousins based on those who have cousins?&amp;lt;br&amp;gt;Note: This number is historically difficult to compute right, but here are some thoughts to help you out in verifying your count.&amp;lt;br&amp;gt;You have to construct a method for finding cousin pairs. Any cousin pair you identify, can be written as a tuple (cpr1, cpr2) in a list.&amp;lt;br&amp;gt; a) There should be no duplicate tuples in the list.&amp;lt;br&amp;gt; b) There should be no tuple with the same cpr on position 1 and 2.&amp;lt;br&amp;gt; c) Because of symmetry the set of cpr1&#039;s is equal to the set of cpr2&#039;s.&amp;lt;br&amp;gt; d) The length of the list of cousin tuples is the number of cousion pairs, and the size of the set of cpr&#039;s is the number of people who have cousins.&lt;br /&gt;
# Is the firstborn likely to be male or female?&lt;br /&gt;
# How many men/women (percentage) have children with more than one woman/man?&lt;br /&gt;
# Do tall people marry (or at least get children together)? To answer that, calculate the percentages of tall/tall, tall/normal, tall/short, normal/normal, normal/short, and short/short couples. Decide your own limits for tall, normal and short, and if they are the same for men and women.&lt;br /&gt;
# Do tall parents get tall children?&lt;br /&gt;
# Do fat people marry (or at least get children together)? To answer that, calculate the percentages of fat/fat, fat/normal, fat/slim, normal/normal, normal/slim, and slim/slim couples. Decide your own limits for fat, normal and slim. Calculate the BMI, and let that be the fatness indicator.&lt;br /&gt;
# Using the knowledge of [https://en.wikipedia.org/wiki/Blood_type#ABO_blood_group_system blood group type inheritance], are there any children in the database where you can safely say that at least one of the parents are not the real parent. If such children exists, make a list of them. In the report you must discuss how you determine that the parent(s) of the child are not the &amp;quot;true&amp;quot; parents.&lt;br /&gt;
# Make a list of fathers who can donate blood to their sons. The list must identify the father and the son(s) and their blood type. You must write the length of the list in the report, together with the number of fathers and the number of sons.&lt;br /&gt;
# Make a list of persons who can donate blood to at least one of their grandparents. The list must identify must the person, the grandparent(s) and their blood type. You must write the length of the list in the report, together with the number of grandchildren and the number of grandparents.&lt;br /&gt;
&lt;br /&gt;
All questions has to answered in one run of the program, but not necessarily in that order. You are welcome to answer other interesting questions, that can be posed from the data. Many questions are about distributions and if the distributions are &amp;quot;normal&amp;quot;. The program can calculate the distributions, but the analysis of the result (evaluating normalcy) is to be in the report.&lt;br /&gt;
I will come with an example: &amp;quot;Is the distribution of first-time fatherhood age normal? A yes/no answer is not good enough.&amp;quot; You must at least calculate and print something like:&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    5%&lt;br /&gt;
 21-25:    10%&lt;br /&gt;
 26-30:    40%&lt;br /&gt;
 31-35:    30%&lt;br /&gt;
 36-40:    15%&lt;br /&gt;
 41-45:	  0%&lt;br /&gt;
From that you simply evaluate if it is normal and put it in the report. I think above numbers are rather normal, but below are very strange. If you want to, you can support your opinions with references to [https://www.dst.dk/en/ Statistics Denmark].&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    0%&lt;br /&gt;
 21-25:    0%&lt;br /&gt;
 26-30:    10%&lt;br /&gt;
 31-35:    20%&lt;br /&gt;
 36-40:    50%&lt;br /&gt;
 41-45:	  20%&lt;br /&gt;
&lt;br /&gt;
The problem you should solve in the project is not &amp;quot;how to make good statistics&amp;quot;, but &amp;quot;how to collect the data from the database&amp;quot;. If you feel you can do better statistics than above, you are welcome.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: For the sanity of the questions you should assume/pretend you are doing this analysis primo 2000.&amp;lt;br&amp;gt; &lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The CPR consists of a date part (first 6 numbers as DDMMYY) which is the birthday, and a 4 digit number. There are rules about how CPR should be constructed, and they are not followed since it is illegal to publish CPR numbers. What you need to know is that the date part is a date in 1900-1999, and the last digit is significant; odd - male, even - female.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The data are somewhat randomly constructed, so you can find &#039;facts&#039; that seem very unlikely, like 6 year old kids with a height of 2 m. Just accept it.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: In the database the children of a person is clear. This means you can follow a thread &#039;&#039;down&#039;&#039; the generations. As can be seen from some of the questions, it can nice to find the parents directly from a child, i.e follow a thread &#039;&#039;up&#039;&#039; the generations. Can you find a way to do this easily?&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=88</id>
		<title>Data analysis</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=88"/>
		<updated>2024-05-12T22:18:40Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
This project is about analyzing specific data and answer various questions about it. The data file is a flat file database constructed in year 2000 with various information about people, and can be seen here as&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/people.db people.db].&lt;br /&gt;
The program must read this file ONCE - line by line - not storing the actual lines for future reference, but entering the data in an appropriate data structure of your own devising. The questions are sometimes asking if some distribution is &amp;quot;normal&amp;quot;. &amp;quot;Normal&amp;quot; here does &#039;&#039;&#039;not&#039;&#039;&#039; mean&lt;br /&gt;
fit the bell curve (standard normal distribution). It means reasonable or natural, what you would expect.&amp;lt;br&amp;gt;&lt;br /&gt;
The program must now answer the following questions:&lt;br /&gt;
&lt;br /&gt;
# Is the age and gender distribution normal/sensible in the database? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the men become fathers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time fatherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the women become mothers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time motherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# How many men and women do not have children (in percent)?&lt;br /&gt;
# What is the average age difference between the parents (with a child in common obviously)?&lt;br /&gt;
# How many people has at least one grandparent that is still alive? A person is living if he/she is in the database. State the number both in percent and as a real number.&lt;br /&gt;
# How many has at least one cousin in the data set? What is the average number of cousins based on those who have cousins?&amp;lt;br&amp;gt;Note: This number is historically difficult to get right, but here are some thoughts to help you out in verifying your count.&amp;lt;br&amp;gt;If you can identify a cousin pair, you can write it as a tuple (cpr1, cpr2) in a list.&amp;lt;br&amp;gt; a) There should be no duplicate tuples in the list.&amp;lt;br&amp;gt; b) There should be no tuple with the same cpr on position 1 and 2.&amp;lt;br&amp;gt; c) Because of symmetry the set of cpr1&#039;s is equal to the set of cpr2&#039;s.&amp;lt;br&amp;gt; d) The length of the list of cousin tuples is the number of cousion pairs, and the size of the set of cpr&#039;s is the number of people who have cousins.&lt;br /&gt;
# Is the firstborn likely to be male or female?&lt;br /&gt;
# How many men/women (percentage) have children with more than one woman/man?&lt;br /&gt;
# Do tall people marry (or at least get children together)? To answer that, calculate the percentages of tall/tall, tall/normal, tall/short, normal/normal, normal/short, and short/short couples. Decide your own limits for tall, normal and short, and if they are the same for men and women.&lt;br /&gt;
# Do tall parents get tall children?&lt;br /&gt;
# Do fat people marry (or at least get children together)? To answer that, calculate the percentages of fat/fat, fat/normal, fat/slim, normal/normal, normal/slim, and slim/slim couples. Decide your own limits for fat, normal and slim. Calculate the BMI, and let that be the fatness indicator.&lt;br /&gt;
# Using the knowledge of [https://en.wikipedia.org/wiki/Blood_type#ABO_blood_group_system blood group type inheritance], are there any children in the database where you can safely say that at least one of the parents are not the real parent. If such children exists, make a list of them. In the report you must discuss how you determine that the parent(s) of the child are not the &amp;quot;true&amp;quot; parents.&lt;br /&gt;
# Make a list of fathers who can donate blood to their sons. The list must identify the father and the son(s) and their blood type. You must write the length of the list in the report, together with the number of fathers and the number of sons.&lt;br /&gt;
# Make a list of persons who can donate blood to at least one of their grandparents. The list must identify must the person, the grandparent(s) and their blood type. You must write the length of the list in the report, together with the number of grandchildren and the number of grandparents.&lt;br /&gt;
&lt;br /&gt;
All questions has to answered in one run of the program, but not necessarily in that order. You are welcome to answer other interesting questions, that can be posed from the data. Many questions are about distributions and if the distributions are &amp;quot;normal&amp;quot;. The program can calculate the distributions, but the analysis of the result (evaluating normalcy) is to be in the report.&lt;br /&gt;
I will come with an example: &amp;quot;Is the distribution of first-time fatherhood age normal? A yes/no answer is not good enough.&amp;quot; You must at least calculate and print something like:&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    5%&lt;br /&gt;
 21-25:    10%&lt;br /&gt;
 26-30:    40%&lt;br /&gt;
 31-35:    30%&lt;br /&gt;
 36-40:    15%&lt;br /&gt;
 41-45:	  0%&lt;br /&gt;
From that you simply evaluate if it is normal and put it in the report. I think above numbers are rather normal, but below are very strange. If you want to, you can support your opinions with references to [https://www.dst.dk/en/ Statistics Denmark].&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    0%&lt;br /&gt;
 21-25:    0%&lt;br /&gt;
 26-30:    10%&lt;br /&gt;
 31-35:    20%&lt;br /&gt;
 36-40:    50%&lt;br /&gt;
 41-45:	  20%&lt;br /&gt;
&lt;br /&gt;
The problem you should solve in the project is not &amp;quot;how to make good statistics&amp;quot;, but &amp;quot;how to collect the data from the database&amp;quot;. If you feel you can do better statistics than above, you are welcome.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: For the sanity of the questions you should assume/pretend you are doing this analysis primo 2000.&amp;lt;br&amp;gt; &lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The CPR consists of a date part (first 6 numbers as DDMMYY) which is the birthday, and a 4 digit number. There are rules about how CPR should be constructed, and they are not followed since it is illegal to publish CPR numbers. What you need to know is that the date part is a date in 1900-1999, and the last digit is significant; odd - male, even - female.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The data are somewhat randomly constructed, so you can find &#039;facts&#039; that seem very unlikely, like 6 year old kids with a height of 2 m. Just accept it.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: In the database the children of a person is clear. This means you can follow a thread &#039;&#039;down&#039;&#039; the generations. As can be seen from some of the questions, it can nice to find the parents directly from a child, i.e follow a thread &#039;&#039;up&#039;&#039; the generations. Can you find a way to do this easily?&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=87</id>
		<title>Data analysis</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Data_analysis&amp;diff=87"/>
		<updated>2024-05-12T22:17:38Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
This project is about analyzing specific data and answer various questions about it. The data file is a flat file database constructed in year 2000 with various information about people, and can be seen here as&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/people.db people.db].&lt;br /&gt;
The program must read this file ONCE - line by line - not storing the actual lines for future reference, but entering the data in an appropriate data structure of your own devising. The questions are sometimes asking if some distribution is &amp;quot;normal&amp;quot;. &amp;quot;Normal&amp;quot; here does &#039;&#039;&#039;not&#039;&#039;&#039; mean&lt;br /&gt;
fit the bell curve (standard normal distribution). It means reasonable or natural, what you would expect.&amp;lt;br&amp;gt;&lt;br /&gt;
The program must now answer the following questions:&lt;br /&gt;
&lt;br /&gt;
# Is the age and gender distribution normal/sensible in the database? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the men become fathers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time fatherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# At what age does the women become mothers first time (max age, min age, average age)?&lt;br /&gt;
# Is the distribution of first-time motherhood age normal/sensible? A yes/no answer is not good enough.&lt;br /&gt;
# How many men and women do not have children (in percent)?&lt;br /&gt;
# What is the average age difference between the parents (with a child in common obviously)?&lt;br /&gt;
# How many people has at least one grandparent that is still alive? A person is living if he/she is in the database. State the number both in percent and as a real number.&lt;br /&gt;
# How many has at least one cousin in the data set? What is the average number of cousins based on those who have cousins?&amp;lt;br&amp;gt;Note: This number is historically difficult to get right, but here are some thoughts to help you out in verifying your count.&amp;lt;br&amp;gt;If you can identify a cousin pair, you can write it as a tuple (cpr1, cpr2) in a list.&amp;lt;br&amp;gt;a) There should be no duplicate tuples in the list.&amp;lt;br&amp;gt;b) There should be no tuple with the same cpr on position 1 and 2.&amp;lt;br&amp;gt;c) Because of symmetry the set of cpr1&#039;s is equal to the set of cpr2&#039;s. d) The length of the list of cousin tuples is the number of cousion pairs, and the size of the set of cpr&#039;s is the number of people who have cousins.&lt;br /&gt;
# Is the firstborn likely to be male or female?&lt;br /&gt;
# How many men/women (percentage) have children with more than one woman/man?&lt;br /&gt;
# Do tall people marry (or at least get children together)? To answer that, calculate the percentages of tall/tall, tall/normal, tall/short, normal/normal, normal/short, and short/short couples. Decide your own limits for tall, normal and short, and if they are the same for men and women.&lt;br /&gt;
# Do tall parents get tall children?&lt;br /&gt;
# Do fat people marry (or at least get children together)? To answer that, calculate the percentages of fat/fat, fat/normal, fat/slim, normal/normal, normal/slim, and slim/slim couples. Decide your own limits for fat, normal and slim. Calculate the BMI, and let that be the fatness indicator.&lt;br /&gt;
# Using the knowledge of [https://en.wikipedia.org/wiki/Blood_type#ABO_blood_group_system blood group type inheritance], are there any children in the database where you can safely say that at least one of the parents are not the real parent. If such children exists, make a list of them. In the report you must discuss how you determine that the parent(s) of the child are not the &amp;quot;true&amp;quot; parents.&lt;br /&gt;
# Make a list of fathers who can donate blood to their sons. The list must identify the father and the son(s) and their blood type. You must write the length of the list in the report, together with the number of fathers and the number of sons.&lt;br /&gt;
# Make a list of persons who can donate blood to at least one of their grandparents. The list must identify must the person, the grandparent(s) and their blood type. You must write the length of the list in the report, together with the number of grandchildren and the number of grandparents.&lt;br /&gt;
&lt;br /&gt;
All questions has to answered in one run of the program, but not necessarily in that order. You are welcome to answer other interesting questions, that can be posed from the data. Many questions are about distributions and if the distributions are &amp;quot;normal&amp;quot;. The program can calculate the distributions, but the analysis of the result (evaluating normalcy) is to be in the report.&lt;br /&gt;
I will come with an example: &amp;quot;Is the distribution of first-time fatherhood age normal? A yes/no answer is not good enough.&amp;quot; You must at least calculate and print something like:&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    5%&lt;br /&gt;
 21-25:    10%&lt;br /&gt;
 26-30:    40%&lt;br /&gt;
 31-35:    30%&lt;br /&gt;
 36-40:    15%&lt;br /&gt;
 41-45:	  0%&lt;br /&gt;
From that you simply evaluate if it is normal and put it in the report. I think above numbers are rather normal, but below are very strange. If you want to, you can support your opinions with references to [https://www.dst.dk/en/ Statistics Denmark].&lt;br /&gt;
 Age       Percentage&lt;br /&gt;
 16-20:    0%&lt;br /&gt;
 21-25:    0%&lt;br /&gt;
 26-30:    10%&lt;br /&gt;
 31-35:    20%&lt;br /&gt;
 36-40:    50%&lt;br /&gt;
 41-45:	  20%&lt;br /&gt;
&lt;br /&gt;
The problem you should solve in the project is not &amp;quot;how to make good statistics&amp;quot;, but &amp;quot;how to collect the data from the database&amp;quot;. If you feel you can do better statistics than above, you are welcome.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: For the sanity of the questions you should assume/pretend you are doing this analysis primo 2000.&amp;lt;br&amp;gt; &lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The CPR consists of a date part (first 6 numbers as DDMMYY) which is the birthday, and a 4 digit number. There are rules about how CPR should be constructed, and they are not followed since it is illegal to publish CPR numbers. What you need to know is that the date part is a date in 1900-1999, and the last digit is significant; odd - male, even - female.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: The data are somewhat randomly constructed, so you can find &#039;facts&#039; that seem very unlikely, like 6 year old kids with a height of 2 m. Just accept it.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tip&#039;&#039;&#039;: In the database the children of a person is clear. This means you can follow a thread &#039;&#039;down&#039;&#039; the generations. As can be seen from some of the questions, it can nice to find the parents directly from a child, i.e follow a thread &#039;&#039;up&#039;&#039; the generations. Can you find a way to do this easily?&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Aligning_expectations&amp;diff=86</id>
		<title>Aligning expectations</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Aligning_expectations&amp;diff=86"/>
		<updated>2024-04-22T17:05:44Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* The report itself */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== What is expected from you ==&lt;br /&gt;
=== Fulfilling prerequisites ===&lt;br /&gt;
A course like [https://kurser.dtu.dk/course/22101 22101/22161 Introduction to programming in Life Science using Python] should enable you for this course.&amp;lt;br&amp;gt;&lt;br /&gt;
Generally speaking, you must know simple Python well, which means you know the basic syntactical structure of Python (assignment, expressions, if, for while, some functions/methods), some data types (integer, float, string, lists, sets, dicts), and trivial file reading and writing, such that you relatively easy can solve minor programming tasks without any use of libraries. You can check if your abilities are up to par by solving some of the exercises in 22101/22161 above.&amp;lt;br&amp;gt;&lt;br /&gt;
You must have your own computer (Windows, Mac, Linux) and you must understand it&#039;s file system structure - the folder hierarchy, file types, and file organization.&lt;br /&gt;
&lt;br /&gt;
=== Special expectations ===&lt;br /&gt;
In the first week you learn Unix. You must work in this environment for the rest of the course. Unix is used in a number of (bioinformatic) courses, and being able to navigate in  Unix is not only a survival skill, but also a skill sought in industry. All major bioinformatic efforts take place on big Unix servers/clusters.&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In hand-ins you follow the skills taught in 22101, specifically how to write comments, using spacing to modularize the code, proper variable (object) naming, proper use of variables, error handling and code clarity, see [[Code construction]].&lt;br /&gt;
&lt;br /&gt;
=== Standard expectations ===&lt;br /&gt;
* You follow the course every week and hand in the required weekly exercises on DTU Learn.&lt;br /&gt;
* You peer-evaluate every week on DTU Learn. A hand-in is required for evaluation to be allowed.&lt;br /&gt;
* You do a project with a peer (i.e. a two person group project) at the end of the course.&lt;br /&gt;
* When getting help from TA&#039;s and teacher, understand that many students need help. You can not expect us to sit an hour with you. If time actually allows for it, we do not mind doing it.&lt;br /&gt;
&lt;br /&gt;
=== Knowing the roles at the university ===&lt;br /&gt;
By knowing the roles of the various entities, you know where to direct your question.&lt;br /&gt;
* &#039;&#039;&#039;The teacher&#039;&#039;&#039; is responsible for the content of the course, the curriculum, the teaching, the project evaluation, the making and evaluation of the exam, and the reporting of cheating. In short - any content.&lt;br /&gt;
* &#039;&#039;&#039;The study office&#039;&#039;&#039; - and by extension - the exam office, the study guidance, and the legal office (cheating) deal with everything that is not course content. A lot of information is available on [https://www.inside.dtu.dk/en/undervisning DTU Inside] among that exam dates and what to do when having problems with the studies. They also process and publish the grades handed in by the teacher.&lt;br /&gt;
&lt;br /&gt;
== Course structure ==&lt;br /&gt;
The course is week-based, meaning new subjects are introduced Monday, and we work with them during the week. Sometimes smaller subjects are introduced Thursdays.&lt;br /&gt;
In the last month of the course a programming project will be made in 2-person groups.&lt;br /&gt;
&lt;br /&gt;
=== Exercises ===&lt;br /&gt;
* Weekly exercises are given every Monday. This constitutes an exercise set. There will be 12 of those. Exercises are mandatory.&lt;br /&gt;
* Exercises have to be uploaded to [https://learn.dtu.dk DTU Learn] latest Sunday in the same week as the exercises were given.&lt;br /&gt;
* Peer evaluation of exercises are done in the following week on [https://learn.dtu.dk DTU Learn] to be handed in Friday. The evaluations are mandatory.&lt;br /&gt;
* At least 10 of 12 evaluations must be handed in on DTU Learn for you to be allowed to take the exam. You can only evaluate if you have handed in exercises.&lt;br /&gt;
* Word or pdf documents are NOT accepted as hand-in - use only simple &#039;&#039;&#039;.txt&#039;&#039;&#039; or &#039;&#039;&#039;.py&#039;&#039;&#039; files.&amp;lt;br&amp;gt;&lt;br /&gt;
* Solutions to each week&#039;s exercises are published before the next week&#039;s lesson on DTU Learn (under Discussions). Can be used as reference for peer evaluation.&lt;br /&gt;
* Exercises which are handed in after the solutions are published, are voided and will not count, no matter the reason for being late.&lt;br /&gt;
* Do not read ahead and start using functions/methods/libraries which will be covered later in the course.&lt;br /&gt;
* &amp;lt;font color=&amp;quot;AA00FF&amp;quot;&amp;gt;&#039;&#039;&#039;Purple exercises&#039;&#039;&#039; has to be done in pseudo code before you start implementing them in Python. The pseudo code is part of the hand-in for these exercises.&amp;lt;br&amp;gt;So - make the pseudocode FIRST, then the real python programs AFTERWARDS for the purple exercises.&amp;lt;/font&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Why &amp;amp; How to peer-evaluate exercises ===&lt;br /&gt;
The peer evaluation is a central part of the learning process, using formative feedback. You can get input from your peers on how you can solve an exercise. &amp;lt;!--The evaluation scheme has a lot of targeted questions about various aspects of the code you will have to evaluate, which will help you understand how you should develop your programs. Initially, it can seem overwhelming, but since the questions repeat week after week, and you mostly have to check boxes, it will get easier during the course.--&amp;gt; You learn both by doing the evaluation and by receiving (reading) it.&lt;br /&gt;
&lt;br /&gt;
You must use the criteria in [[Code construction]] in your own programs and the evaluation.&lt;br /&gt;
&lt;br /&gt;
=== Projects, general information ===&lt;br /&gt;
* A project is done with two students in each group, no exceptions unless there is an odd number of students, in which case a one (wo)man group is formed. The teacher will form the groups, but will accept already formed groups, if informed in time, see the google sheet found via the announcement at DTU Learn.&lt;br /&gt;
* Each group will choose a project from the [[Project list]] in the last part of the course. If a group has an idea for a different project, this must be discussed with the teacher to see if it is feasible. Such a &amp;quot;home made&amp;quot; project must be of sufficient complexity, but not too complex either, and have clear goals, so it can be measured if it was failed or not.&lt;br /&gt;
* A project is estimated to be doable in 50 hours of work - people often use more time.&lt;br /&gt;
* The project work consists of two phases; 1) Doing a project - making code and writing report. 2) Individual (not in the group) peer evaluation of another groups project. Thus every project is peer evaluated twice. Both phases are mandatory.&lt;br /&gt;
* The teacher will evaluate all projects and peer evaluations. Understand that YOUR evaluation of another groups project is part of YOUR project work, and as such it will be part of YOUR grade.&lt;br /&gt;
* The project work will count for 50% of the final grade. The project grade is thus combined from the group effort of doing a project and the individual effort of peer grading a project.&lt;br /&gt;
* The project will be handed in through DTU Learn at the time written in the [[Programme]].&lt;br /&gt;
* The teacher will distribute the projects per mail for peer evaluation with the intention of NOT evaluation the same type of project you made. &lt;br /&gt;
* The evaluation of another groups project will be handed in 1 week later on DTU Learn.&lt;br /&gt;
&lt;br /&gt;
=== Getting help with the project ===&lt;br /&gt;
* The groups can consult the teacher and the TA&#039;s on problems with the projects. The teacher has the best overview, while TA&#039;s often can only help with actual problems in the code.&lt;br /&gt;
* Groups can talk to each other on their project, but actual cooperation between groups with the same project is not allowed. Here is a simple and clear rule:&amp;lt;br&amp;gt;&#039;&#039;&#039;Groups must not show any written material (typically code or report) to any other group&#039;&#039;&#039;.&lt;br /&gt;
* Consulting Google on programming issues is fine, but understand what is being said/written and why it works.&lt;br /&gt;
* Nothing but Python libraries which have been taught in the course can be used in projects. Consult the teacher if in doubt.&lt;br /&gt;
* The teacher should be informed if a group is dysfunctional. We will work something out.&lt;br /&gt;
&lt;br /&gt;
=== Content of the project ===&lt;br /&gt;
Each project consists of:&lt;br /&gt;
* A report, preferably in PDF. &lt;br /&gt;
* The program code.&lt;br /&gt;
* Any data files of relevance &#039;&#039;&#039;not&#039;&#039;&#039; supplied through the course.&lt;br /&gt;
* A signed version of this [https://teaching.healthtech.dtu.dk/material/22113/ProjectStatement22113.pdf statement]. &amp;lt;div style=&amp;quot;color:red;display:inline;&amp;quot;&amp;gt;DO NOT FORGET&amp;lt;/div&amp;gt; It is unfortunate when students fail because they forget a mandatory element.&lt;br /&gt;
&lt;br /&gt;
==== The report itself ====&lt;br /&gt;
The report should be written is such a way that it can be understood by your peer (fellow student), who have no knowledge of the specifics of your project. I have provided some sources of help, for writing reports, etc. at university level.&amp;lt;br&amp;gt;&lt;br /&gt;
* PDF: [https://teaching.healthtech.dtu.dk/material/22113/Vejledning_i_opbygning_og_skrivning_af_rapporter_v.2013.3.pdf Vejledning i opbygning og skrivning af rapporter]&lt;br /&gt;
* PDF: [https://teaching.healthtech.dtu.dk/material/22113/how_to_write_reports.pdf Guide on how to write a report]&lt;br /&gt;
* Online: [https://www.elsevier.com/connect/11-steps-to-structuring-a-science-paper-editors-will-take-seriously How to structure a science paper]&lt;br /&gt;
* PDF: [https://www.imm.dtu.dk/~janba/MastersThesisAdvice.pdf Master Thesis Advice]&lt;br /&gt;
* Online: [https://thereader.mitpress.mit.edu/umberto-eco-how-to-write-a-thesis/ How to write a thesis]&lt;br /&gt;
&lt;br /&gt;
It is considered unlikely that that the report is less than 4 pages, but there is no set limitation.&lt;br /&gt;
The report is evaluated by quality, not by length. Some projects are naturally heavy in theory,&lt;br /&gt;
others have a more practical approach, and the report is expected to reflect that to some extent.&lt;br /&gt;
&lt;br /&gt;
The following sections can/should be in the report in approximately this order:&amp;lt;br&amp;gt;&lt;br /&gt;
* Introduction - mandatory&lt;br /&gt;
* Contribution - mandatory&lt;br /&gt;
* Theory - mandatory&lt;br /&gt;
* Algorithm design - mandatory&lt;br /&gt;
* Program design - mandatory&lt;br /&gt;
* Program manual - mandatory&lt;br /&gt;
* Results - optional, depends a bit on the project if it is natural to include&lt;br /&gt;
* Runtime analysis - mandatory&lt;br /&gt;
* Unit Testing - mandatory&lt;br /&gt;
* Conclusion - mandatory&lt;br /&gt;
* References - mandatory if any references&lt;br /&gt;
&lt;br /&gt;
The sections are explained in detail below. It is important that the report reads naturally with easy flow from one subject to the next. It should also be a coherent logical structure. As an example you must explain a concept/system/method before you use it - not use it first and later tell what it means. Nor should you write the same thing twice. The report must cover all sections in some form, but as the projects are different, different emphasis will be placed on the different sections. An example is that the Theory section is important for project 9-11, but much less important and &amp;quot;theoretical&amp;quot; for project 1,3,4,5. Results are really important in project 5 and 6, but much less in the rest. The sections on &amp;quot;Theory&amp;quot;, &amp;quot;Algorithm design&amp;quot; and &amp;quot;Program design&amp;quot; can sometimes content-wise overlap each other depending on the nature of the project - it can be a question of simply drawing a line.&lt;br /&gt;
&lt;br /&gt;
==== The code ====&lt;br /&gt;
* The program code as &#039;&#039;&#039;.py&#039;&#039;&#039; or &#039;&#039;&#039;.txt&#039;&#039;&#039; files. It is a separate file from the report.&lt;br /&gt;
* The code should be clearly structured and well commented so it is possible to follow your thinking.&lt;br /&gt;
* The major data structures should also be explained with structure and purpose.&lt;br /&gt;
* The code should obviously follow the guidelines that has been given during the course in exercises.&lt;br /&gt;
* Unit testing of at least part of the code should be included. This can be done as a separate folder with test and test data.&lt;br /&gt;
&lt;br /&gt;
==== Report sections ====&lt;br /&gt;
* &#039;&#039;&#039;Introduction&#039;&#039;&#039; - A short section explaining why your project and program is important and useful and in which context it should be used. Can also be used to introduce some background.&lt;br /&gt;
* &#039;&#039;&#039;Contribution&#039;&#039;&#039; - The should be made clear, who contributed to which parts of the project, if the contributions are uneven or clearly split up. For a group that worked closely together (the best), it is completely fine to write &amp;quot;equal contribution&amp;quot; from all members, if such is the case and it is not clear who has the main contribution. &amp;quot;Equal contribution&amp;quot; is also fine for situations where the main author of pieces of code and sections of reports changes in the group such that both members have roughly contributed evenly on both programming and writing. Caveat: If you write &amp;quot;equal contribution&amp;quot; you will pass or fail together.&lt;br /&gt;
* &#039;&#039;&#039;Theory&#039;&#039;&#039; - If you have math, equations, systems, ideas that lies behind the code you do, then it should be described and explained here. Any &amp;quot;facts&amp;quot; that you are using in the code or programming against should be described in this section. High-level decisions you make, that affects how the code will work, could be described - could be using a specific library.&lt;br /&gt;
* &#039;&#039;&#039;Algorithm design&#039;&#039;&#039; -  Every project has at least one &amp;quot;core&amp;quot; algorithm or method that the project revolves about. In this section you explain &#039;&#039;how&#039;&#039; the core algorithm(s) works. Pseudo code is great for making a structure you can explain it from. Some people use diagrams.&lt;br /&gt;
* &#039;&#039;&#039;Program design&#039;&#039;&#039; - This is where you give an overview of your program: &#039;&#039;where&#039;&#039; are your functions, &#039;&#039;where&#039;&#039; is your input and output, &#039;&#039;where&#039;&#039; is the main core. To show the structural overview of your program, pseudo code is again a great method.&lt;br /&gt;
* &#039;&#039;&#039;Program manual&#039;&#039;&#039; - Describe how to use the program(s), input format, various program options, expected output, example runs. Some screenshots works great here.&lt;br /&gt;
* &#039;&#039;&#039;Results&#039;&#039;&#039; -  Show/describe/list your results of the program runs in this section.&lt;br /&gt;
* &#039;&#039;&#039;Runtime analysis&#039;&#039;&#039; -  In this section you analyze the performance of the program in Big O terms, see [[Runtime evaluation of algorithms]]. You must present calculations supported by arguments, not just results.&lt;br /&gt;
* &#039;&#039;&#039;Unit Testing&#039;&#039;&#039; - Write a small section about what unit test you made on what sections of the code. Disclose if you found errors by writing the tests.&lt;br /&gt;
* &#039;&#039;&#039;Conclusion&#039;&#039;&#039; -  Talk about the results you achieved, if you have not done so already. Discuss strength and weaknesses of the program/algorithm. Future goals or improvements.&lt;br /&gt;
* &#039;&#039;&#039;References&#039;&#039;&#039; - If you used articles, web resources, or other information sources during the project, these need to be stated here.&lt;br /&gt;
Some people thinks the algorithm design overlaps a lot with program design, and while there is some truth to that, then algorithm design is detailed about the central algorithm/part, while program design gives an overview of the code, so a reader can easily find the section of interest.&lt;br /&gt;
&lt;br /&gt;
=== How to evaluate the project ===&lt;br /&gt;
The projects consists of two very different parts: The report and the program/code. They must be evaluated in different ways, but both evaluations must result in some written text.&amp;lt;br&amp;gt;&lt;br /&gt;
Here is a [https://teaching.healthtech.dtu.dk/material/22113/projectevaltemplate.docx template evaluation document], which you can use if you wish.&lt;br /&gt;
==== The report ====&lt;br /&gt;
The report should be evaluated on form and content. It should be understood that the report must be well structured and argued, so somebody not familiar with the subject can understand it.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Form:&#039;&#039;&#039; This consists of spelling, meaningful sentences, good language. Importantly, the sections as described above must be present. The sections: &amp;quot;Theory&amp;quot;, &amp;quot;Algorithm design&amp;quot; and &amp;quot;Program design&amp;quot; should be appropriate for the project type and some flexibility is allowed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Content:&#039;&#039;&#039; The content should obviously be correct and be evaluated in the light of the descriptions of the different report sections above.&lt;br /&gt;
&lt;br /&gt;
==== The program(s) ====&lt;br /&gt;
The code is evaluated on our usual criteria for exercises as can be seen on [[Code construction]] and in peer evaluation for the exercises. I will shortly mention correctness, structure, readability as the main focus.&lt;br /&gt;
&lt;br /&gt;
== Passing the course ==&lt;br /&gt;
The grade in the course is pass/fail. There are two parts which (according to DTU rules) both have to be passed in order to pass the course.&lt;br /&gt;
# The 4 hour written exam with all aids allowed. Weight 50%.&lt;br /&gt;
# The project work (project + peer evaluation). Weight 50%.&lt;br /&gt;
Furthermore, the weekly exercises + peer evaluations are mandatory. At least 10 out of 12 exercise sets + peer evaluations must be handed in to be allowed in exam participation.&amp;lt;br&amp;gt;&lt;br /&gt;
If you fail the exam, but pass the project, you only have to do a re-exam and vice versa. Failing both is a do-over.&lt;br /&gt;
&lt;br /&gt;
=== Conduction of the exam ===&lt;br /&gt;
* The course is using [https://eksamen.dtu.dk/ Digital Exam].&lt;br /&gt;
* You must use your own computer for the exam. According to DTU rules, any technical problems with your computer during the exam is your responsibility and you will get no extra time or leniency if it breaks. You will be evaluated on what you (not) hand in.&lt;br /&gt;
* Internal censoring is used, meaning the teacher and a qualified colleague will grade the exam as pass/non-pass.&lt;br /&gt;
* The exam is with all aids allowed, but there will NOT be open internet. This is following the standard DTU exam template. All aids covers written material like course powerpoints, books and exercises.&lt;br /&gt;
==== Exam purpose ====&lt;br /&gt;
The exam tests your ability in practical Python and programmatic problem solving.&lt;br /&gt;
&lt;br /&gt;
To say it plainly: Can you create Python code that can solve a minor computational problem? That is a major goal of the course.&lt;br /&gt;
&lt;br /&gt;
In particular, the exam addresses the following &#039;&#039;&#039;learning objectives&#039;&#039;&#039;, some only partly:&amp;lt;br&amp;gt;&lt;br /&gt;
* Demonstrate and explain the python syntax, object mode, data structures, classes and 65-70 Python methods/functions.&lt;br /&gt;
* Exercise pattern recognition in (bioinformatic) data files in order to extract information.&lt;br /&gt;
* Apply methods/programming techniques demonstrated in the course to similar problems.&lt;br /&gt;
* Analyze a (programming) problem and ascertain its components, and create an efficient solution by applying the right components in the right order.&lt;br /&gt;
* Analyze a program and based on its behavior, locate and eradicate errors.&lt;br /&gt;
* Write clear, precise and well documented code, which is suitable for greater collaborative efforts.&lt;br /&gt;
* Utilize code libraries, both scientific and other, for fast and good solution of programming tasks.&lt;br /&gt;
&lt;br /&gt;
==== What actually happens at exam ====&lt;br /&gt;
You will get a pdf with 2 Python programming assignments. Each assignment (which tells you what to do) has a number of input files and a number of output files, which you download. You are supposed to create 2 programs that can generate the appropriate output file(s) from the relevant input file(s) according to the assignments. There should be no confusion about what you have to generate as you have the correct output available. When done or time is up, you hand in (upload) your programs.&lt;br /&gt;
&lt;br /&gt;
If you manage to reproduce output files which are binary identical to the output files from the assignment, then you have automatically passed, because you have demonstrably achieved a major course goal.&amp;lt;br&amp;gt;&lt;br /&gt;
If you did not reproduce the output files, then your code will be inspected and evaluated by the teacher. If it is meaningful and sensible (see learning objectives) you will pass. Reproducing one of the output files, i.e. solving one of the assignments is not enough for a pass, but it is a good start.&lt;br /&gt;
&lt;br /&gt;
==== How to see if files are binary identical? ====&lt;br /&gt;
In Unix (you are using Unix, right? The course is conducted with that in mind) you can do a (md5) checksum.&lt;br /&gt;
 md5sum file1 file2&lt;br /&gt;
If the checksums are the same, the files are identical.&amp;lt;br&amp;gt;&lt;br /&gt;
You can also do a diff.&lt;br /&gt;
 diff file1 file2&lt;br /&gt;
If they are identical, nothing happens. If not, you will be informed about the differences - in a confusing way until you learn to decipher it.&lt;br /&gt;
&lt;br /&gt;
If you are uncertain about this, maybe you should practice a few times &#039;&#039;before&#039;&#039; the exam.&lt;br /&gt;
You can also google for it, because there are many ways of doing this.&lt;br /&gt;
&lt;br /&gt;
=== The project work ===&lt;br /&gt;
In the project all aids are allowed, like books, powerpoints, google, teachers, or knowledge sources. Make sure you understand what you are told/read. Uncritical copy/paste is not going to help you or the project. Make references when relevant.&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;&#039;&#039;&#039;All aids&#039;&#039;&#039;&amp;quot; does not cover what normally constitutes cheating, like copying other students work or copying from solutions found on the internet or using libraries that solves the entire problem for you.&lt;br /&gt;
&lt;br /&gt;
The teacher will grade the project work. It consists of 3 parts. The report, the code and the peer evaluation.&amp;lt;br&amp;gt;&lt;br /&gt;
The project work addresses the following &#039;&#039;&#039;learning objectives&#039;&#039;&#039;, some only partly:&lt;br /&gt;
* Use the command line of Unix with 10-15 common Unix commands, inclusive file system navigation, pipelines, process and file system control.&lt;br /&gt;
* Demonstrate and explain the python syntax, object mode, data structures, classes and 65-70 Python methods/functions.&lt;br /&gt;
* Exercise pattern recognition in (bioinformatic) data files in order to extract information.&lt;br /&gt;
* Apply methods/programming techniques demonstrated in the course to similar problems.&lt;br /&gt;
* Analyze a (programming) problem and ascertain its components, and create an efficient solution by applying the right components in the right order.&lt;br /&gt;
* Analyze a program and based on its behavior, locate and eradicate errors.&lt;br /&gt;
* Evaluate the quality of the code, based on criteria shown in the course, and ensuring the code meets quality standards by employing the unit test technique.&lt;br /&gt;
* Write clear, precise and well documented code, which is suitable for greater collaborative efforts.&lt;br /&gt;
* Evaluate the performance and efficiency of code with respect to speed and memory consumption using Big O notation.&lt;br /&gt;
* Utilize code libraries, both scientific and other, for fast and good solution of programming tasks.&lt;br /&gt;
The project will be evaluated on a overall view of how well it answers the problems stated in the project description. Below are various elements specified.&lt;br /&gt;
&lt;br /&gt;
==== General engineering competences relevant for the project work ====&lt;br /&gt;
Ability to write coherent text, to form proper sentences in English, to finish a consistent quality product (head lines, TOC, no missing words or figures, using same words for concepts, using the right scientific word/concept, etc.). Ability to collaborate both in writing text and programming code with other people.&lt;br /&gt;
&lt;br /&gt;
==== Skills and competences relevant for the report, which will form the foundation of the evaluation ====&lt;br /&gt;
* Analyze the project formulation and develop a method/algorithm for solving the problem.&lt;br /&gt;
* Clearly describe theoretical aspects of the project and/or algorithm; this could be mathematical foundation, input file data formats, categories used, prerequisites and assumptions, data structure elements or clever ideas.&lt;br /&gt;
* Clearly and coherently describe the algorithm and relevant data structures.&lt;br /&gt;
* When designing an algorithm, it can suffer from performance problems, which can be related to both speed and excessive use of memory. Can you recognize the shortcomings in your own algorithm? &lt;br /&gt;
* Show the proper use of pseudo code.&lt;br /&gt;
* Appropriate use of figures, graphs, illustrations and screenshots.&lt;br /&gt;
* Demonstrate the actual structure of the finished program, so the reader can understand what is going on where when seeing the code.&lt;br /&gt;
* Evaluate what is trivial and what is not, and focus on explaining the non-trivial things in the program.&lt;br /&gt;
* Understand and clearly describe a runtime evaluation of your code in Big O notation.&lt;br /&gt;
* Present results from the project in a relevant and clear fashion.&lt;br /&gt;
* See the perspective of your project. Where can it be used? What can be improved? What did you learn?&lt;br /&gt;
* Building a project is a creative process. Let that creativity shine through.&lt;br /&gt;
&lt;br /&gt;
==== Skills and competences relevant for the code, which will form the foundation of the evaluation ====&lt;br /&gt;
During the course you have been presented for many skills and competences in the peer evaluation and through that gained a good grasp of the easier coding skills.&lt;br /&gt;
* Correct commenting in code, placement and content&lt;br /&gt;
* Spacing and modularization&lt;br /&gt;
* Object naming&lt;br /&gt;
* Using appropriate amount and type of variables for the task.&lt;br /&gt;
* Error handling &lt;br /&gt;
The project demands more of the in-depth coding skills.&lt;br /&gt;
* Avoiding simple structural flaws&lt;br /&gt;
* Ability to write clear and precise code.&lt;br /&gt;
* Writing concise code.&lt;br /&gt;
* Writing sensible and meaningful code. This is not included by above points.&lt;br /&gt;
* This point is about significant performance problems not related to an inherent problem with the algorithm (which is covered in the report).&lt;br /&gt;
# Recognizing and avoiding unnecessary loops or methods which slow the speed of the program. &lt;br /&gt;
# Recognizing and avoiding unnecessary use of computer memory.&lt;br /&gt;
* Reach for beauty and elegance. That happens when it all comes together in a clear and obvious chain of events that leads to the result.&lt;br /&gt;
&lt;br /&gt;
==== Skills and competences relevant for the peer evaluation ====&lt;br /&gt;
* You will use many of the competences used in your own report and code.&lt;br /&gt;
* Ability to critically read and evaluate text and foreign code (see through bullshit).&lt;br /&gt;
&lt;br /&gt;
== Failing the course ==&lt;br /&gt;
The course has two parts, which must both be passed in order to pass the course: Project and exam.&amp;lt;br&amp;gt;&lt;br /&gt;
There are also the weekly exercises and peer evaluations, which are mandatory to hand in for you to be allowed to take the exam.&lt;br /&gt;
They do not count in passing the course, but consistently doing poorly in exercises is not going to aid&lt;br /&gt;
the project or the exam.&lt;br /&gt;
=== Failing the exam ===&lt;br /&gt;
This is the most frequent way of failing the course. Said in simple words, the exam tests your ability - when faced with a problem - to produce meaningful code that solves the problem. As can be seen by the text above, it is not strictly required that the code works. That is just an easy way to measure ability. What &#039;&#039;&#039;is&#039;&#039;&#039; required, is that the code is meaningful given the problem. Half done, or explaining what you intend is not sufficient.&lt;br /&gt;
&lt;br /&gt;
The 4 challenges for failing students are:&lt;br /&gt;
* difficulty in analyzing the problem - understanding what has to be done.&lt;br /&gt;
* difficulty in formulating a strategy for solving the problem - designing the method/algorithm.&lt;br /&gt;
* difficulty in executing the strategy - transforming the method/idea into working code.&lt;br /&gt;
* difficulty in making the code coherent - having the grand overview of the method/code.&lt;br /&gt;
&lt;br /&gt;
There are lots of mistakes a student can do at an exam; insufficient knowledge of python (syntax and functions), bad performance, repetitive code, missing variables, wrongly iterating loops, missing the right position of sequences etc., but they are not disastrous on the individual level - although they reflect your skill. The problem is when many mistakes/omissions come together and form a mix of the 4 challenges.&lt;br /&gt;
&lt;br /&gt;
If you have not improved on these 4 points when you are taking an re-exam, the result is quite predictable.&amp;lt;br&amp;gt;&lt;br /&gt;
How to improve: Practice, practice, and then some practice. You can not read it in a book or powerpoint, or see the videos again. That helps in knowing the syntax and functions of python, but that is not the issue. The issue is training your brain in the analyzing, formulating and executing phases of coherent programming.&lt;br /&gt;
&lt;br /&gt;
=== Failing the project ===&lt;br /&gt;
The reasons for failing the project are usually:&lt;br /&gt;
* more than one severe, game changing mistake has been made in the code.&lt;br /&gt;
* severe misunderstandings about the content/purpose of the project.&lt;br /&gt;
* too low quality in code and report.&lt;br /&gt;
The first two could somewhat easily be avoided by consulting the teacher. The last... you need to up your game; Take the time to write a decent report - there are resources available on how to do that. The code can/must be improved by learning more during the course - study the exercises and the solutions. A lot of the same reasons for failing the exam also apply here.&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Aligning_expectations&amp;diff=85</id>
		<title>Aligning expectations</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Aligning_expectations&amp;diff=85"/>
		<updated>2024-04-22T17:05:15Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Report sections */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== What is expected from you ==&lt;br /&gt;
=== Fulfilling prerequisites ===&lt;br /&gt;
A course like [https://kurser.dtu.dk/course/22101 22101/22161 Introduction to programming in Life Science using Python] should enable you for this course.&amp;lt;br&amp;gt;&lt;br /&gt;
Generally speaking, you must know simple Python well, which means you know the basic syntactical structure of Python (assignment, expressions, if, for while, some functions/methods), some data types (integer, float, string, lists, sets, dicts), and trivial file reading and writing, such that you relatively easy can solve minor programming tasks without any use of libraries. You can check if your abilities are up to par by solving some of the exercises in 22101/22161 above.&amp;lt;br&amp;gt;&lt;br /&gt;
You must have your own computer (Windows, Mac, Linux) and you must understand it&#039;s file system structure - the folder hierarchy, file types, and file organization.&lt;br /&gt;
&lt;br /&gt;
=== Special expectations ===&lt;br /&gt;
In the first week you learn Unix. You must work in this environment for the rest of the course. Unix is used in a number of (bioinformatic) courses, and being able to navigate in  Unix is not only a survival skill, but also a skill sought in industry. All major bioinformatic efforts take place on big Unix servers/clusters.&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In hand-ins you follow the skills taught in 22101, specifically how to write comments, using spacing to modularize the code, proper variable (object) naming, proper use of variables, error handling and code clarity, see [[Code construction]].&lt;br /&gt;
&lt;br /&gt;
=== Standard expectations ===&lt;br /&gt;
* You follow the course every week and hand in the required weekly exercises on DTU Learn.&lt;br /&gt;
* You peer-evaluate every week on DTU Learn. A hand-in is required for evaluation to be allowed.&lt;br /&gt;
* You do a project with a peer (i.e. a two person group project) at the end of the course.&lt;br /&gt;
* When getting help from TA&#039;s and teacher, understand that many students need help. You can not expect us to sit an hour with you. If time actually allows for it, we do not mind doing it.&lt;br /&gt;
&lt;br /&gt;
=== Knowing the roles at the university ===&lt;br /&gt;
By knowing the roles of the various entities, you know where to direct your question.&lt;br /&gt;
* &#039;&#039;&#039;The teacher&#039;&#039;&#039; is responsible for the content of the course, the curriculum, the teaching, the project evaluation, the making and evaluation of the exam, and the reporting of cheating. In short - any content.&lt;br /&gt;
* &#039;&#039;&#039;The study office&#039;&#039;&#039; - and by extension - the exam office, the study guidance, and the legal office (cheating) deal with everything that is not course content. A lot of information is available on [https://www.inside.dtu.dk/en/undervisning DTU Inside] among that exam dates and what to do when having problems with the studies. They also process and publish the grades handed in by the teacher.&lt;br /&gt;
&lt;br /&gt;
== Course structure ==&lt;br /&gt;
The course is week-based, meaning new subjects are introduced Monday, and we work with them during the week. Sometimes smaller subjects are introduced Thursdays.&lt;br /&gt;
In the last month of the course a programming project will be made in 2-person groups.&lt;br /&gt;
&lt;br /&gt;
=== Exercises ===&lt;br /&gt;
* Weekly exercises are given every Monday. This constitutes an exercise set. There will be 12 of those. Exercises are mandatory.&lt;br /&gt;
* Exercises have to be uploaded to [https://learn.dtu.dk DTU Learn] latest Sunday in the same week as the exercises were given.&lt;br /&gt;
* Peer evaluation of exercises are done in the following week on [https://learn.dtu.dk DTU Learn] to be handed in Friday. The evaluations are mandatory.&lt;br /&gt;
* At least 10 of 12 evaluations must be handed in on DTU Learn for you to be allowed to take the exam. You can only evaluate if you have handed in exercises.&lt;br /&gt;
* Word or pdf documents are NOT accepted as hand-in - use only simple &#039;&#039;&#039;.txt&#039;&#039;&#039; or &#039;&#039;&#039;.py&#039;&#039;&#039; files.&amp;lt;br&amp;gt;&lt;br /&gt;
* Solutions to each week&#039;s exercises are published before the next week&#039;s lesson on DTU Learn (under Discussions). Can be used as reference for peer evaluation.&lt;br /&gt;
* Exercises which are handed in after the solutions are published, are voided and will not count, no matter the reason for being late.&lt;br /&gt;
* Do not read ahead and start using functions/methods/libraries which will be covered later in the course.&lt;br /&gt;
* &amp;lt;font color=&amp;quot;AA00FF&amp;quot;&amp;gt;&#039;&#039;&#039;Purple exercises&#039;&#039;&#039; has to be done in pseudo code before you start implementing them in Python. The pseudo code is part of the hand-in for these exercises.&amp;lt;br&amp;gt;So - make the pseudocode FIRST, then the real python programs AFTERWARDS for the purple exercises.&amp;lt;/font&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Why &amp;amp; How to peer-evaluate exercises ===&lt;br /&gt;
The peer evaluation is a central part of the learning process, using formative feedback. You can get input from your peers on how you can solve an exercise. &amp;lt;!--The evaluation scheme has a lot of targeted questions about various aspects of the code you will have to evaluate, which will help you understand how you should develop your programs. Initially, it can seem overwhelming, but since the questions repeat week after week, and you mostly have to check boxes, it will get easier during the course.--&amp;gt; You learn both by doing the evaluation and by receiving (reading) it.&lt;br /&gt;
&lt;br /&gt;
You must use the criteria in [[Code construction]] in your own programs and the evaluation.&lt;br /&gt;
&lt;br /&gt;
=== Projects, general information ===&lt;br /&gt;
* A project is done with two students in each group, no exceptions unless there is an odd number of students, in which case a one (wo)man group is formed. The teacher will form the groups, but will accept already formed groups, if informed in time, see the google sheet found via the announcement at DTU Learn.&lt;br /&gt;
* Each group will choose a project from the [[Project list]] in the last part of the course. If a group has an idea for a different project, this must be discussed with the teacher to see if it is feasible. Such a &amp;quot;home made&amp;quot; project must be of sufficient complexity, but not too complex either, and have clear goals, so it can be measured if it was failed or not.&lt;br /&gt;
* A project is estimated to be doable in 50 hours of work - people often use more time.&lt;br /&gt;
* The project work consists of two phases; 1) Doing a project - making code and writing report. 2) Individual (not in the group) peer evaluation of another groups project. Thus every project is peer evaluated twice. Both phases are mandatory.&lt;br /&gt;
* The teacher will evaluate all projects and peer evaluations. Understand that YOUR evaluation of another groups project is part of YOUR project work, and as such it will be part of YOUR grade.&lt;br /&gt;
* The project work will count for 50% of the final grade. The project grade is thus combined from the group effort of doing a project and the individual effort of peer grading a project.&lt;br /&gt;
* The project will be handed in through DTU Learn at the time written in the [[Programme]].&lt;br /&gt;
* The teacher will distribute the projects per mail for peer evaluation with the intention of NOT evaluation the same type of project you made. &lt;br /&gt;
* The evaluation of another groups project will be handed in 1 week later on DTU Learn.&lt;br /&gt;
&lt;br /&gt;
=== Getting help with the project ===&lt;br /&gt;
* The groups can consult the teacher and the TA&#039;s on problems with the projects. The teacher has the best overview, while TA&#039;s often can only help with actual problems in the code.&lt;br /&gt;
* Groups can talk to each other on their project, but actual cooperation between groups with the same project is not allowed. Here is a simple and clear rule:&amp;lt;br&amp;gt;&#039;&#039;&#039;Groups must not show any written material (typically code or report) to any other group&#039;&#039;&#039;.&lt;br /&gt;
* Consulting Google on programming issues is fine, but understand what is being said/written and why it works.&lt;br /&gt;
* Nothing but Python libraries which have been taught in the course can be used in projects. Consult the teacher if in doubt.&lt;br /&gt;
* The teacher should be informed if a group is dysfunctional. We will work something out.&lt;br /&gt;
&lt;br /&gt;
=== Content of the project ===&lt;br /&gt;
Each project consists of:&lt;br /&gt;
* A report, preferably in PDF. &lt;br /&gt;
* The program code.&lt;br /&gt;
* Any data files of relevance &#039;&#039;&#039;not&#039;&#039;&#039; supplied through the course.&lt;br /&gt;
* A signed version of this [https://teaching.healthtech.dtu.dk/material/22113/ProjectStatement22113.pdf statement]. &amp;lt;div style=&amp;quot;color:red;display:inline;&amp;quot;&amp;gt;DO NOT FORGET&amp;lt;/div&amp;gt; It is unfortunate when students fail because they forget a mandatory element.&lt;br /&gt;
&lt;br /&gt;
==== The report itself ====&lt;br /&gt;
The report should be written is such a way that it can be understood by your peer (fellow student), who have no knowledge of the specifics of your project. I have provided some sources of help, for writing reports, etc. at university level.&amp;lt;br&amp;gt;&lt;br /&gt;
* PDF: [https://teaching.healthtech.dtu.dk/material/22113/Vejledning_i_opbygning_og_skrivning_af_rapporter_v.2013.3.pdf Vejledning i opbygning og skrivning af rapporter]&lt;br /&gt;
* PDF: [https://teaching.healthtech.dtu.dk/material/22113/how_to_write_reports.pdf Guide on how to write a report]&lt;br /&gt;
* Online: [https://www.elsevier.com/connect/11-steps-to-structuring-a-science-paper-editors-will-take-seriously How to structure a science paper]&lt;br /&gt;
* PDF: [https://www.imm.dtu.dk/~janba/MastersThesisAdvice.pdf Master Thesis Advice]&lt;br /&gt;
* Online: [https://thereader.mitpress.mit.edu/umberto-eco-how-to-write-a-thesis/ How to write a thesis]&lt;br /&gt;
&lt;br /&gt;
It is considered unlikely that that the report is less than 4 pages, but there is no set limitation.&lt;br /&gt;
The report is evaluated by quality, not by length. Some projects are naturally heavy in theory,&lt;br /&gt;
others have a more practical approach, and the report is expected to reflect that to some extent.&lt;br /&gt;
&lt;br /&gt;
The following sections can/should be in the report in approximately this order:&amp;lt;br&amp;gt;&lt;br /&gt;
* Introduction - mandatory&lt;br /&gt;
* Contribution - mandatory&lt;br /&gt;
* Theory - mandatory&lt;br /&gt;
* Algorithm design - mandatory&lt;br /&gt;
* Program design - mandatory&lt;br /&gt;
* Program manual - mandatory&lt;br /&gt;
* Results - optional, depends a bit on the project if it is natural to include&lt;br /&gt;
* Runtime analysis - mandatory&lt;br /&gt;
* (Unit) Testing - mandatory&lt;br /&gt;
* Conclusion - mandatory&lt;br /&gt;
* References - mandatory if any references&lt;br /&gt;
&lt;br /&gt;
The sections are explained in detail below. It is important that the report reads naturally with easy flow from one subject to the next. It should also be a coherent logical structure. As an example you must explain a concept/system/method before you use it - not use it first and later tell what it means. Nor should you write the same thing twice. The report must cover all sections in some form, but as the projects are different, different emphasis will be placed on the different sections. An example is that the Theory section is important for project 9-11, but much less important and &amp;quot;theoretical&amp;quot; for project 1,3,4,5. Results are really important in project 5 and 6, but much less in the rest. The sections on &amp;quot;Theory&amp;quot;, &amp;quot;Algorithm design&amp;quot; and &amp;quot;Program design&amp;quot; can sometimes content-wise overlap each other depending on the nature of the project - it can be a question of simply drawing a line.&lt;br /&gt;
&lt;br /&gt;
==== The code ====&lt;br /&gt;
* The program code as &#039;&#039;&#039;.py&#039;&#039;&#039; or &#039;&#039;&#039;.txt&#039;&#039;&#039; files. It is a separate file from the report.&lt;br /&gt;
* The code should be clearly structured and well commented so it is possible to follow your thinking.&lt;br /&gt;
* The major data structures should also be explained with structure and purpose.&lt;br /&gt;
* The code should obviously follow the guidelines that has been given during the course in exercises.&lt;br /&gt;
* Unit testing of at least part of the code should be included. This can be done as a separate folder with test and test data.&lt;br /&gt;
&lt;br /&gt;
==== Report sections ====&lt;br /&gt;
* &#039;&#039;&#039;Introduction&#039;&#039;&#039; - A short section explaining why your project and program is important and useful and in which context it should be used. Can also be used to introduce some background.&lt;br /&gt;
* &#039;&#039;&#039;Contribution&#039;&#039;&#039; - The should be made clear, who contributed to which parts of the project, if the contributions are uneven or clearly split up. For a group that worked closely together (the best), it is completely fine to write &amp;quot;equal contribution&amp;quot; from all members, if such is the case and it is not clear who has the main contribution. &amp;quot;Equal contribution&amp;quot; is also fine for situations where the main author of pieces of code and sections of reports changes in the group such that both members have roughly contributed evenly on both programming and writing. Caveat: If you write &amp;quot;equal contribution&amp;quot; you will pass or fail together.&lt;br /&gt;
* &#039;&#039;&#039;Theory&#039;&#039;&#039; - If you have math, equations, systems, ideas that lies behind the code you do, then it should be described and explained here. Any &amp;quot;facts&amp;quot; that you are using in the code or programming against should be described in this section. High-level decisions you make, that affects how the code will work, could be described - could be using a specific library.&lt;br /&gt;
* &#039;&#039;&#039;Algorithm design&#039;&#039;&#039; -  Every project has at least one &amp;quot;core&amp;quot; algorithm or method that the project revolves about. In this section you explain &#039;&#039;how&#039;&#039; the core algorithm(s) works. Pseudo code is great for making a structure you can explain it from. Some people use diagrams.&lt;br /&gt;
* &#039;&#039;&#039;Program design&#039;&#039;&#039; - This is where you give an overview of your program: &#039;&#039;where&#039;&#039; are your functions, &#039;&#039;where&#039;&#039; is your input and output, &#039;&#039;where&#039;&#039; is the main core. To show the structural overview of your program, pseudo code is again a great method.&lt;br /&gt;
* &#039;&#039;&#039;Program manual&#039;&#039;&#039; - Describe how to use the program(s), input format, various program options, expected output, example runs. Some screenshots works great here.&lt;br /&gt;
* &#039;&#039;&#039;Results&#039;&#039;&#039; -  Show/describe/list your results of the program runs in this section.&lt;br /&gt;
* &#039;&#039;&#039;Runtime analysis&#039;&#039;&#039; -  In this section you analyze the performance of the program in Big O terms, see [[Runtime evaluation of algorithms]]. You must present calculations supported by arguments, not just results.&lt;br /&gt;
* &#039;&#039;&#039;Unit Testing&#039;&#039;&#039; - Write a small section about what unit test you made on what sections of the code. Disclose if you found errors by writing the tests.&lt;br /&gt;
* &#039;&#039;&#039;Conclusion&#039;&#039;&#039; -  Talk about the results you achieved, if you have not done so already. Discuss strength and weaknesses of the program/algorithm. Future goals or improvements.&lt;br /&gt;
* &#039;&#039;&#039;References&#039;&#039;&#039; - If you used articles, web resources, or other information sources during the project, these need to be stated here.&lt;br /&gt;
Some people thinks the algorithm design overlaps a lot with program design, and while there is some truth to that, then algorithm design is detailed about the central algorithm/part, while program design gives an overview of the code, so a reader can easily find the section of interest.&lt;br /&gt;
&lt;br /&gt;
=== How to evaluate the project ===&lt;br /&gt;
The projects consists of two very different parts: The report and the program/code. They must be evaluated in different ways, but both evaluations must result in some written text.&amp;lt;br&amp;gt;&lt;br /&gt;
Here is a [https://teaching.healthtech.dtu.dk/material/22113/projectevaltemplate.docx template evaluation document], which you can use if you wish.&lt;br /&gt;
==== The report ====&lt;br /&gt;
The report should be evaluated on form and content. It should be understood that the report must be well structured and argued, so somebody not familiar with the subject can understand it.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Form:&#039;&#039;&#039; This consists of spelling, meaningful sentences, good language. Importantly, the sections as described above must be present. The sections: &amp;quot;Theory&amp;quot;, &amp;quot;Algorithm design&amp;quot; and &amp;quot;Program design&amp;quot; should be appropriate for the project type and some flexibility is allowed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Content:&#039;&#039;&#039; The content should obviously be correct and be evaluated in the light of the descriptions of the different report sections above.&lt;br /&gt;
&lt;br /&gt;
==== The program(s) ====&lt;br /&gt;
The code is evaluated on our usual criteria for exercises as can be seen on [[Code construction]] and in peer evaluation for the exercises. I will shortly mention correctness, structure, readability as the main focus.&lt;br /&gt;
&lt;br /&gt;
== Passing the course ==&lt;br /&gt;
The grade in the course is pass/fail. There are two parts which (according to DTU rules) both have to be passed in order to pass the course.&lt;br /&gt;
# The 4 hour written exam with all aids allowed. Weight 50%.&lt;br /&gt;
# The project work (project + peer evaluation). Weight 50%.&lt;br /&gt;
Furthermore, the weekly exercises + peer evaluations are mandatory. At least 10 out of 12 exercise sets + peer evaluations must be handed in to be allowed in exam participation.&amp;lt;br&amp;gt;&lt;br /&gt;
If you fail the exam, but pass the project, you only have to do a re-exam and vice versa. Failing both is a do-over.&lt;br /&gt;
&lt;br /&gt;
=== Conduction of the exam ===&lt;br /&gt;
* The course is using [https://eksamen.dtu.dk/ Digital Exam].&lt;br /&gt;
* You must use your own computer for the exam. According to DTU rules, any technical problems with your computer during the exam is your responsibility and you will get no extra time or leniency if it breaks. You will be evaluated on what you (not) hand in.&lt;br /&gt;
* Internal censoring is used, meaning the teacher and a qualified colleague will grade the exam as pass/non-pass.&lt;br /&gt;
* The exam is with all aids allowed, but there will NOT be open internet. This is following the standard DTU exam template. All aids covers written material like course powerpoints, books and exercises.&lt;br /&gt;
==== Exam purpose ====&lt;br /&gt;
The exam tests your ability in practical Python and programmatic problem solving.&lt;br /&gt;
&lt;br /&gt;
To say it plainly: Can you create Python code that can solve a minor computational problem? That is a major goal of the course.&lt;br /&gt;
&lt;br /&gt;
In particular, the exam addresses the following &#039;&#039;&#039;learning objectives&#039;&#039;&#039;, some only partly:&amp;lt;br&amp;gt;&lt;br /&gt;
* Demonstrate and explain the python syntax, object mode, data structures, classes and 65-70 Python methods/functions.&lt;br /&gt;
* Exercise pattern recognition in (bioinformatic) data files in order to extract information.&lt;br /&gt;
* Apply methods/programming techniques demonstrated in the course to similar problems.&lt;br /&gt;
* Analyze a (programming) problem and ascertain its components, and create an efficient solution by applying the right components in the right order.&lt;br /&gt;
* Analyze a program and based on its behavior, locate and eradicate errors.&lt;br /&gt;
* Write clear, precise and well documented code, which is suitable for greater collaborative efforts.&lt;br /&gt;
* Utilize code libraries, both scientific and other, for fast and good solution of programming tasks.&lt;br /&gt;
&lt;br /&gt;
==== What actually happens at exam ====&lt;br /&gt;
You will get a pdf with 2 Python programming assignments. Each assignment (which tells you what to do) has a number of input files and a number of output files, which you download. You are supposed to create 2 programs that can generate the appropriate output file(s) from the relevant input file(s) according to the assignments. There should be no confusion about what you have to generate as you have the correct output available. When done or time is up, you hand in (upload) your programs.&lt;br /&gt;
&lt;br /&gt;
If you manage to reproduce output files which are binary identical to the output files from the assignment, then you have automatically passed, because you have demonstrably achieved a major course goal.&amp;lt;br&amp;gt;&lt;br /&gt;
If you did not reproduce the output files, then your code will be inspected and evaluated by the teacher. If it is meaningful and sensible (see learning objectives) you will pass. Reproducing one of the output files, i.e. solving one of the assignments is not enough for a pass, but it is a good start.&lt;br /&gt;
&lt;br /&gt;
==== How to see if files are binary identical? ====&lt;br /&gt;
In Unix (you are using Unix, right? The course is conducted with that in mind) you can do a (md5) checksum.&lt;br /&gt;
 md5sum file1 file2&lt;br /&gt;
If the checksums are the same, the files are identical.&amp;lt;br&amp;gt;&lt;br /&gt;
You can also do a diff.&lt;br /&gt;
 diff file1 file2&lt;br /&gt;
If they are identical, nothing happens. If not, you will be informed about the differences - in a confusing way until you learn to decipher it.&lt;br /&gt;
&lt;br /&gt;
If you are uncertain about this, maybe you should practice a few times &#039;&#039;before&#039;&#039; the exam.&lt;br /&gt;
You can also google for it, because there are many ways of doing this.&lt;br /&gt;
&lt;br /&gt;
=== The project work ===&lt;br /&gt;
In the project all aids are allowed, like books, powerpoints, google, teachers, or knowledge sources. Make sure you understand what you are told/read. Uncritical copy/paste is not going to help you or the project. Make references when relevant.&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;&#039;&#039;&#039;All aids&#039;&#039;&#039;&amp;quot; does not cover what normally constitutes cheating, like copying other students work or copying from solutions found on the internet or using libraries that solves the entire problem for you.&lt;br /&gt;
&lt;br /&gt;
The teacher will grade the project work. It consists of 3 parts. The report, the code and the peer evaluation.&amp;lt;br&amp;gt;&lt;br /&gt;
The project work addresses the following &#039;&#039;&#039;learning objectives&#039;&#039;&#039;, some only partly:&lt;br /&gt;
* Use the command line of Unix with 10-15 common Unix commands, inclusive file system navigation, pipelines, process and file system control.&lt;br /&gt;
* Demonstrate and explain the python syntax, object mode, data structures, classes and 65-70 Python methods/functions.&lt;br /&gt;
* Exercise pattern recognition in (bioinformatic) data files in order to extract information.&lt;br /&gt;
* Apply methods/programming techniques demonstrated in the course to similar problems.&lt;br /&gt;
* Analyze a (programming) problem and ascertain its components, and create an efficient solution by applying the right components in the right order.&lt;br /&gt;
* Analyze a program and based on its behavior, locate and eradicate errors.&lt;br /&gt;
* Evaluate the quality of the code, based on criteria shown in the course, and ensuring the code meets quality standards by employing the unit test technique.&lt;br /&gt;
* Write clear, precise and well documented code, which is suitable for greater collaborative efforts.&lt;br /&gt;
* Evaluate the performance and efficiency of code with respect to speed and memory consumption using Big O notation.&lt;br /&gt;
* Utilize code libraries, both scientific and other, for fast and good solution of programming tasks.&lt;br /&gt;
The project will be evaluated on a overall view of how well it answers the problems stated in the project description. Below are various elements specified.&lt;br /&gt;
&lt;br /&gt;
==== General engineering competences relevant for the project work ====&lt;br /&gt;
Ability to write coherent text, to form proper sentences in English, to finish a consistent quality product (head lines, TOC, no missing words or figures, using same words for concepts, using the right scientific word/concept, etc.). Ability to collaborate both in writing text and programming code with other people.&lt;br /&gt;
&lt;br /&gt;
==== Skills and competences relevant for the report, which will form the foundation of the evaluation ====&lt;br /&gt;
* Analyze the project formulation and develop a method/algorithm for solving the problem.&lt;br /&gt;
* Clearly describe theoretical aspects of the project and/or algorithm; this could be mathematical foundation, input file data formats, categories used, prerequisites and assumptions, data structure elements or clever ideas.&lt;br /&gt;
* Clearly and coherently describe the algorithm and relevant data structures.&lt;br /&gt;
* When designing an algorithm, it can suffer from performance problems, which can be related to both speed and excessive use of memory. Can you recognize the shortcomings in your own algorithm? &lt;br /&gt;
* Show the proper use of pseudo code.&lt;br /&gt;
* Appropriate use of figures, graphs, illustrations and screenshots.&lt;br /&gt;
* Demonstrate the actual structure of the finished program, so the reader can understand what is going on where when seeing the code.&lt;br /&gt;
* Evaluate what is trivial and what is not, and focus on explaining the non-trivial things in the program.&lt;br /&gt;
* Understand and clearly describe a runtime evaluation of your code in Big O notation.&lt;br /&gt;
* Present results from the project in a relevant and clear fashion.&lt;br /&gt;
* See the perspective of your project. Where can it be used? What can be improved? What did you learn?&lt;br /&gt;
* Building a project is a creative process. Let that creativity shine through.&lt;br /&gt;
&lt;br /&gt;
==== Skills and competences relevant for the code, which will form the foundation of the evaluation ====&lt;br /&gt;
During the course you have been presented for many skills and competences in the peer evaluation and through that gained a good grasp of the easier coding skills.&lt;br /&gt;
* Correct commenting in code, placement and content&lt;br /&gt;
* Spacing and modularization&lt;br /&gt;
* Object naming&lt;br /&gt;
* Using appropriate amount and type of variables for the task.&lt;br /&gt;
* Error handling &lt;br /&gt;
The project demands more of the in-depth coding skills.&lt;br /&gt;
* Avoiding simple structural flaws&lt;br /&gt;
* Ability to write clear and precise code.&lt;br /&gt;
* Writing concise code.&lt;br /&gt;
* Writing sensible and meaningful code. This is not included by above points.&lt;br /&gt;
* This point is about significant performance problems not related to an inherent problem with the algorithm (which is covered in the report).&lt;br /&gt;
# Recognizing and avoiding unnecessary loops or methods which slow the speed of the program. &lt;br /&gt;
# Recognizing and avoiding unnecessary use of computer memory.&lt;br /&gt;
* Reach for beauty and elegance. That happens when it all comes together in a clear and obvious chain of events that leads to the result.&lt;br /&gt;
&lt;br /&gt;
==== Skills and competences relevant for the peer evaluation ====&lt;br /&gt;
* You will use many of the competences used in your own report and code.&lt;br /&gt;
* Ability to critically read and evaluate text and foreign code (see through bullshit).&lt;br /&gt;
&lt;br /&gt;
== Failing the course ==&lt;br /&gt;
The course has two parts, which must both be passed in order to pass the course: Project and exam.&amp;lt;br&amp;gt;&lt;br /&gt;
There are also the weekly exercises and peer evaluations, which are mandatory to hand in for you to be allowed to take the exam.&lt;br /&gt;
They do not count in passing the course, but consistently doing poorly in exercises is not going to aid&lt;br /&gt;
the project or the exam.&lt;br /&gt;
=== Failing the exam ===&lt;br /&gt;
This is the most frequent way of failing the course. Said in simple words, the exam tests your ability - when faced with a problem - to produce meaningful code that solves the problem. As can be seen by the text above, it is not strictly required that the code works. That is just an easy way to measure ability. What &#039;&#039;&#039;is&#039;&#039;&#039; required, is that the code is meaningful given the problem. Half done, or explaining what you intend is not sufficient.&lt;br /&gt;
&lt;br /&gt;
The 4 challenges for failing students are:&lt;br /&gt;
* difficulty in analyzing the problem - understanding what has to be done.&lt;br /&gt;
* difficulty in formulating a strategy for solving the problem - designing the method/algorithm.&lt;br /&gt;
* difficulty in executing the strategy - transforming the method/idea into working code.&lt;br /&gt;
* difficulty in making the code coherent - having the grand overview of the method/code.&lt;br /&gt;
&lt;br /&gt;
There are lots of mistakes a student can do at an exam; insufficient knowledge of python (syntax and functions), bad performance, repetitive code, missing variables, wrongly iterating loops, missing the right position of sequences etc., but they are not disastrous on the individual level - although they reflect your skill. The problem is when many mistakes/omissions come together and form a mix of the 4 challenges.&lt;br /&gt;
&lt;br /&gt;
If you have not improved on these 4 points when you are taking an re-exam, the result is quite predictable.&amp;lt;br&amp;gt;&lt;br /&gt;
How to improve: Practice, practice, and then some practice. You can not read it in a book or powerpoint, or see the videos again. That helps in knowing the syntax and functions of python, but that is not the issue. The issue is training your brain in the analyzing, formulating and executing phases of coherent programming.&lt;br /&gt;
&lt;br /&gt;
=== Failing the project ===&lt;br /&gt;
The reasons for failing the project are usually:&lt;br /&gt;
* more than one severe, game changing mistake has been made in the code.&lt;br /&gt;
* severe misunderstandings about the content/purpose of the project.&lt;br /&gt;
* too low quality in code and report.&lt;br /&gt;
The first two could somewhat easily be avoided by consulting the teacher. The last... you need to up your game; Take the time to write a decent report - there are resources available on how to do that. The code can/must be improved by learning more during the course - study the exercises and the solutions. A lot of the same reasons for failing the exam also apply here.&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Aligning_expectations&amp;diff=84</id>
		<title>Aligning expectations</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Aligning_expectations&amp;diff=84"/>
		<updated>2024-04-22T16:58:30Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* The code */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== What is expected from you ==&lt;br /&gt;
=== Fulfilling prerequisites ===&lt;br /&gt;
A course like [https://kurser.dtu.dk/course/22101 22101/22161 Introduction to programming in Life Science using Python] should enable you for this course.&amp;lt;br&amp;gt;&lt;br /&gt;
Generally speaking, you must know simple Python well, which means you know the basic syntactical structure of Python (assignment, expressions, if, for while, some functions/methods), some data types (integer, float, string, lists, sets, dicts), and trivial file reading and writing, such that you relatively easy can solve minor programming tasks without any use of libraries. You can check if your abilities are up to par by solving some of the exercises in 22101/22161 above.&amp;lt;br&amp;gt;&lt;br /&gt;
You must have your own computer (Windows, Mac, Linux) and you must understand it&#039;s file system structure - the folder hierarchy, file types, and file organization.&lt;br /&gt;
&lt;br /&gt;
=== Special expectations ===&lt;br /&gt;
In the first week you learn Unix. You must work in this environment for the rest of the course. Unix is used in a number of (bioinformatic) courses, and being able to navigate in  Unix is not only a survival skill, but also a skill sought in industry. All major bioinformatic efforts take place on big Unix servers/clusters.&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In hand-ins you follow the skills taught in 22101, specifically how to write comments, using spacing to modularize the code, proper variable (object) naming, proper use of variables, error handling and code clarity, see [[Code construction]].&lt;br /&gt;
&lt;br /&gt;
=== Standard expectations ===&lt;br /&gt;
* You follow the course every week and hand in the required weekly exercises on DTU Learn.&lt;br /&gt;
* You peer-evaluate every week on DTU Learn. A hand-in is required for evaluation to be allowed.&lt;br /&gt;
* You do a project with a peer (i.e. a two person group project) at the end of the course.&lt;br /&gt;
* When getting help from TA&#039;s and teacher, understand that many students need help. You can not expect us to sit an hour with you. If time actually allows for it, we do not mind doing it.&lt;br /&gt;
&lt;br /&gt;
=== Knowing the roles at the university ===&lt;br /&gt;
By knowing the roles of the various entities, you know where to direct your question.&lt;br /&gt;
* &#039;&#039;&#039;The teacher&#039;&#039;&#039; is responsible for the content of the course, the curriculum, the teaching, the project evaluation, the making and evaluation of the exam, and the reporting of cheating. In short - any content.&lt;br /&gt;
* &#039;&#039;&#039;The study office&#039;&#039;&#039; - and by extension - the exam office, the study guidance, and the legal office (cheating) deal with everything that is not course content. A lot of information is available on [https://www.inside.dtu.dk/en/undervisning DTU Inside] among that exam dates and what to do when having problems with the studies. They also process and publish the grades handed in by the teacher.&lt;br /&gt;
&lt;br /&gt;
== Course structure ==&lt;br /&gt;
The course is week-based, meaning new subjects are introduced Monday, and we work with them during the week. Sometimes smaller subjects are introduced Thursdays.&lt;br /&gt;
In the last month of the course a programming project will be made in 2-person groups.&lt;br /&gt;
&lt;br /&gt;
=== Exercises ===&lt;br /&gt;
* Weekly exercises are given every Monday. This constitutes an exercise set. There will be 12 of those. Exercises are mandatory.&lt;br /&gt;
* Exercises have to be uploaded to [https://learn.dtu.dk DTU Learn] latest Sunday in the same week as the exercises were given.&lt;br /&gt;
* Peer evaluation of exercises are done in the following week on [https://learn.dtu.dk DTU Learn] to be handed in Friday. The evaluations are mandatory.&lt;br /&gt;
* At least 10 of 12 evaluations must be handed in on DTU Learn for you to be allowed to take the exam. You can only evaluate if you have handed in exercises.&lt;br /&gt;
* Word or pdf documents are NOT accepted as hand-in - use only simple &#039;&#039;&#039;.txt&#039;&#039;&#039; or &#039;&#039;&#039;.py&#039;&#039;&#039; files.&amp;lt;br&amp;gt;&lt;br /&gt;
* Solutions to each week&#039;s exercises are published before the next week&#039;s lesson on DTU Learn (under Discussions). Can be used as reference for peer evaluation.&lt;br /&gt;
* Exercises which are handed in after the solutions are published, are voided and will not count, no matter the reason for being late.&lt;br /&gt;
* Do not read ahead and start using functions/methods/libraries which will be covered later in the course.&lt;br /&gt;
* &amp;lt;font color=&amp;quot;AA00FF&amp;quot;&amp;gt;&#039;&#039;&#039;Purple exercises&#039;&#039;&#039; has to be done in pseudo code before you start implementing them in Python. The pseudo code is part of the hand-in for these exercises.&amp;lt;br&amp;gt;So - make the pseudocode FIRST, then the real python programs AFTERWARDS for the purple exercises.&amp;lt;/font&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Why &amp;amp; How to peer-evaluate exercises ===&lt;br /&gt;
The peer evaluation is a central part of the learning process, using formative feedback. You can get input from your peers on how you can solve an exercise. &amp;lt;!--The evaluation scheme has a lot of targeted questions about various aspects of the code you will have to evaluate, which will help you understand how you should develop your programs. Initially, it can seem overwhelming, but since the questions repeat week after week, and you mostly have to check boxes, it will get easier during the course.--&amp;gt; You learn both by doing the evaluation and by receiving (reading) it.&lt;br /&gt;
&lt;br /&gt;
You must use the criteria in [[Code construction]] in your own programs and the evaluation.&lt;br /&gt;
&lt;br /&gt;
=== Projects, general information ===&lt;br /&gt;
* A project is done with two students in each group, no exceptions unless there is an odd number of students, in which case a one (wo)man group is formed. The teacher will form the groups, but will accept already formed groups, if informed in time, see the google sheet found via the announcement at DTU Learn.&lt;br /&gt;
* Each group will choose a project from the [[Project list]] in the last part of the course. If a group has an idea for a different project, this must be discussed with the teacher to see if it is feasible. Such a &amp;quot;home made&amp;quot; project must be of sufficient complexity, but not too complex either, and have clear goals, so it can be measured if it was failed or not.&lt;br /&gt;
* A project is estimated to be doable in 50 hours of work - people often use more time.&lt;br /&gt;
* The project work consists of two phases; 1) Doing a project - making code and writing report. 2) Individual (not in the group) peer evaluation of another groups project. Thus every project is peer evaluated twice. Both phases are mandatory.&lt;br /&gt;
* The teacher will evaluate all projects and peer evaluations. Understand that YOUR evaluation of another groups project is part of YOUR project work, and as such it will be part of YOUR grade.&lt;br /&gt;
* The project work will count for 50% of the final grade. The project grade is thus combined from the group effort of doing a project and the individual effort of peer grading a project.&lt;br /&gt;
* The project will be handed in through DTU Learn at the time written in the [[Programme]].&lt;br /&gt;
* The teacher will distribute the projects per mail for peer evaluation with the intention of NOT evaluation the same type of project you made. &lt;br /&gt;
* The evaluation of another groups project will be handed in 1 week later on DTU Learn.&lt;br /&gt;
&lt;br /&gt;
=== Getting help with the project ===&lt;br /&gt;
* The groups can consult the teacher and the TA&#039;s on problems with the projects. The teacher has the best overview, while TA&#039;s often can only help with actual problems in the code.&lt;br /&gt;
* Groups can talk to each other on their project, but actual cooperation between groups with the same project is not allowed. Here is a simple and clear rule:&amp;lt;br&amp;gt;&#039;&#039;&#039;Groups must not show any written material (typically code or report) to any other group&#039;&#039;&#039;.&lt;br /&gt;
* Consulting Google on programming issues is fine, but understand what is being said/written and why it works.&lt;br /&gt;
* Nothing but Python libraries which have been taught in the course can be used in projects. Consult the teacher if in doubt.&lt;br /&gt;
* The teacher should be informed if a group is dysfunctional. We will work something out.&lt;br /&gt;
&lt;br /&gt;
=== Content of the project ===&lt;br /&gt;
Each project consists of:&lt;br /&gt;
* A report, preferably in PDF. &lt;br /&gt;
* The program code.&lt;br /&gt;
* Any data files of relevance &#039;&#039;&#039;not&#039;&#039;&#039; supplied through the course.&lt;br /&gt;
* A signed version of this [https://teaching.healthtech.dtu.dk/material/22113/ProjectStatement22113.pdf statement]. &amp;lt;div style=&amp;quot;color:red;display:inline;&amp;quot;&amp;gt;DO NOT FORGET&amp;lt;/div&amp;gt; It is unfortunate when students fail because they forget a mandatory element.&lt;br /&gt;
&lt;br /&gt;
==== The report itself ====&lt;br /&gt;
The report should be written is such a way that it can be understood by your peer (fellow student), who have no knowledge of the specifics of your project. I have provided some sources of help, for writing reports, etc. at university level.&amp;lt;br&amp;gt;&lt;br /&gt;
* PDF: [https://teaching.healthtech.dtu.dk/material/22113/Vejledning_i_opbygning_og_skrivning_af_rapporter_v.2013.3.pdf Vejledning i opbygning og skrivning af rapporter]&lt;br /&gt;
* PDF: [https://teaching.healthtech.dtu.dk/material/22113/how_to_write_reports.pdf Guide on how to write a report]&lt;br /&gt;
* Online: [https://www.elsevier.com/connect/11-steps-to-structuring-a-science-paper-editors-will-take-seriously How to structure a science paper]&lt;br /&gt;
* PDF: [https://www.imm.dtu.dk/~janba/MastersThesisAdvice.pdf Master Thesis Advice]&lt;br /&gt;
* Online: [https://thereader.mitpress.mit.edu/umberto-eco-how-to-write-a-thesis/ How to write a thesis]&lt;br /&gt;
&lt;br /&gt;
It is considered unlikely that that the report is less than 4 pages, but there is no set limitation.&lt;br /&gt;
The report is evaluated by quality, not by length. Some projects are naturally heavy in theory,&lt;br /&gt;
others have a more practical approach, and the report is expected to reflect that to some extent.&lt;br /&gt;
&lt;br /&gt;
The following sections can/should be in the report in approximately this order:&amp;lt;br&amp;gt;&lt;br /&gt;
* Introduction - mandatory&lt;br /&gt;
* Contribution - mandatory&lt;br /&gt;
* Theory - mandatory&lt;br /&gt;
* Algorithm design - mandatory&lt;br /&gt;
* Program design - mandatory&lt;br /&gt;
* Program manual - mandatory&lt;br /&gt;
* Results - optional, depends a bit on the project if it is natural to include&lt;br /&gt;
* Runtime analysis - mandatory&lt;br /&gt;
* (Unit) Testing - mandatory&lt;br /&gt;
* Conclusion - mandatory&lt;br /&gt;
* References - mandatory if any references&lt;br /&gt;
&lt;br /&gt;
The sections are explained in detail below. It is important that the report reads naturally with easy flow from one subject to the next. It should also be a coherent logical structure. As an example you must explain a concept/system/method before you use it - not use it first and later tell what it means. Nor should you write the same thing twice. The report must cover all sections in some form, but as the projects are different, different emphasis will be placed on the different sections. An example is that the Theory section is important for project 9-11, but much less important and &amp;quot;theoretical&amp;quot; for project 1,3,4,5. Results are really important in project 5 and 6, but much less in the rest. The sections on &amp;quot;Theory&amp;quot;, &amp;quot;Algorithm design&amp;quot; and &amp;quot;Program design&amp;quot; can sometimes content-wise overlap each other depending on the nature of the project - it can be a question of simply drawing a line.&lt;br /&gt;
&lt;br /&gt;
==== The code ====&lt;br /&gt;
* The program code as &#039;&#039;&#039;.py&#039;&#039;&#039; or &#039;&#039;&#039;.txt&#039;&#039;&#039; files. It is a separate file from the report.&lt;br /&gt;
* The code should be clearly structured and well commented so it is possible to follow your thinking.&lt;br /&gt;
* The major data structures should also be explained with structure and purpose.&lt;br /&gt;
* The code should obviously follow the guidelines that has been given during the course in exercises.&lt;br /&gt;
* Unit testing of at least part of the code should be included. This can be done as a separate folder with test and test data.&lt;br /&gt;
&lt;br /&gt;
==== Report sections ====&lt;br /&gt;
* &#039;&#039;&#039;Introduction&#039;&#039;&#039; - A short section explaining why your project and program is important and useful and in which context it should be used. Can also be used to introduce some background.&lt;br /&gt;
* &#039;&#039;&#039;Contribution&#039;&#039;&#039; - The should be made clear, who contributed to which parts of the project, if the contributions are uneven or clearly split up. For a group that worked closely together (the best), it is completely fine to write &amp;quot;equal contribution&amp;quot; from all members, if such is the case and it is not clear who has the main contribution. &amp;quot;Equal contribution&amp;quot; is also fine for situations where the main author of pieces of code and sections of reports changes in the group such that both members have roughly contributed evenly on both programming and writing. Caveat: If you write &amp;quot;equal contribution&amp;quot; you will pass or fail together.&lt;br /&gt;
* &#039;&#039;&#039;Theory&#039;&#039;&#039; - If you have math, equations, systems, ideas that lies behind the code you do, then it should be described and explained here. Any &amp;quot;facts&amp;quot; that you are using in the code or programming against should be described in this section. High-level decisions you make, that affects how the code will work, could be described - could be using a specific library.&lt;br /&gt;
* &#039;&#039;&#039;Algorithm design&#039;&#039;&#039; -  Every project has at least one &amp;quot;core&amp;quot; algorithm or method that the project revolves about. In this section you explain &#039;&#039;how&#039;&#039; the core algorithm(s) works. Pseudo code is great for making a structure you can explain it from. Some people use diagrams.&lt;br /&gt;
* &#039;&#039;&#039;Program design&#039;&#039;&#039; - This is where you give an overview of your program: &#039;&#039;where&#039;&#039; are your functions, &#039;&#039;where&#039;&#039; is your input and output, &#039;&#039;where&#039;&#039; is the main core. To show the structural overview of your program, pseudo code is again a great method.&lt;br /&gt;
* &#039;&#039;&#039;Program manual&#039;&#039;&#039; - Describe how to use the program(s), input format, various program options, expected output, example runs. Some screenshots works great here.&lt;br /&gt;
* &#039;&#039;&#039;Results&#039;&#039;&#039; -  Show/describe/list your results of the program runs in this section.&lt;br /&gt;
* &#039;&#039;&#039;Runtime analysis&#039;&#039;&#039; -  In this section you analyze the performance of the program in Big O terms, see [[Runtime evaluation of algorithms]]. You must present calculations supported by arguments, not just results.&lt;br /&gt;
* &#039;&#039;&#039;Conclusion&#039;&#039;&#039; -  Talk about the results you achieved, if you have not done so already. Discuss strength and weaknesses of the program/algorithm. Future goals or improvements.&lt;br /&gt;
* &#039;&#039;&#039;References&#039;&#039;&#039; - If you used articles, web resources, or other information sources during the project, these need to be stated here.&lt;br /&gt;
Some people thinks the algorithm design overlaps a lot with program design, and while there is some truth to that, then algorithm design is detailed about the central algorithm/part, while program design gives an overview of the code, so a reader can easily find the section of interest.&lt;br /&gt;
&lt;br /&gt;
=== How to evaluate the project ===&lt;br /&gt;
The projects consists of two very different parts: The report and the program/code. They must be evaluated in different ways, but both evaluations must result in some written text.&amp;lt;br&amp;gt;&lt;br /&gt;
Here is a [https://teaching.healthtech.dtu.dk/material/22113/projectevaltemplate.docx template evaluation document], which you can use if you wish.&lt;br /&gt;
==== The report ====&lt;br /&gt;
The report should be evaluated on form and content. It should be understood that the report must be well structured and argued, so somebody not familiar with the subject can understand it.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Form:&#039;&#039;&#039; This consists of spelling, meaningful sentences, good language. Importantly, the sections as described above must be present. The sections: &amp;quot;Theory&amp;quot;, &amp;quot;Algorithm design&amp;quot; and &amp;quot;Program design&amp;quot; should be appropriate for the project type and some flexibility is allowed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Content:&#039;&#039;&#039; The content should obviously be correct and be evaluated in the light of the descriptions of the different report sections above.&lt;br /&gt;
&lt;br /&gt;
==== The program(s) ====&lt;br /&gt;
The code is evaluated on our usual criteria for exercises as can be seen on [[Code construction]] and in peer evaluation for the exercises. I will shortly mention correctness, structure, readability as the main focus.&lt;br /&gt;
&lt;br /&gt;
== Passing the course ==&lt;br /&gt;
The grade in the course is pass/fail. There are two parts which (according to DTU rules) both have to be passed in order to pass the course.&lt;br /&gt;
# The 4 hour written exam with all aids allowed. Weight 50%.&lt;br /&gt;
# The project work (project + peer evaluation). Weight 50%.&lt;br /&gt;
Furthermore, the weekly exercises + peer evaluations are mandatory. At least 10 out of 12 exercise sets + peer evaluations must be handed in to be allowed in exam participation.&amp;lt;br&amp;gt;&lt;br /&gt;
If you fail the exam, but pass the project, you only have to do a re-exam and vice versa. Failing both is a do-over.&lt;br /&gt;
&lt;br /&gt;
=== Conduction of the exam ===&lt;br /&gt;
* The course is using [https://eksamen.dtu.dk/ Digital Exam].&lt;br /&gt;
* You must use your own computer for the exam. According to DTU rules, any technical problems with your computer during the exam is your responsibility and you will get no extra time or leniency if it breaks. You will be evaluated on what you (not) hand in.&lt;br /&gt;
* Internal censoring is used, meaning the teacher and a qualified colleague will grade the exam as pass/non-pass.&lt;br /&gt;
* The exam is with all aids allowed, but there will NOT be open internet. This is following the standard DTU exam template. All aids covers written material like course powerpoints, books and exercises.&lt;br /&gt;
==== Exam purpose ====&lt;br /&gt;
The exam tests your ability in practical Python and programmatic problem solving.&lt;br /&gt;
&lt;br /&gt;
To say it plainly: Can you create Python code that can solve a minor computational problem? That is a major goal of the course.&lt;br /&gt;
&lt;br /&gt;
In particular, the exam addresses the following &#039;&#039;&#039;learning objectives&#039;&#039;&#039;, some only partly:&amp;lt;br&amp;gt;&lt;br /&gt;
* Demonstrate and explain the python syntax, object mode, data structures, classes and 65-70 Python methods/functions.&lt;br /&gt;
* Exercise pattern recognition in (bioinformatic) data files in order to extract information.&lt;br /&gt;
* Apply methods/programming techniques demonstrated in the course to similar problems.&lt;br /&gt;
* Analyze a (programming) problem and ascertain its components, and create an efficient solution by applying the right components in the right order.&lt;br /&gt;
* Analyze a program and based on its behavior, locate and eradicate errors.&lt;br /&gt;
* Write clear, precise and well documented code, which is suitable for greater collaborative efforts.&lt;br /&gt;
* Utilize code libraries, both scientific and other, for fast and good solution of programming tasks.&lt;br /&gt;
&lt;br /&gt;
==== What actually happens at exam ====&lt;br /&gt;
You will get a pdf with 2 Python programming assignments. Each assignment (which tells you what to do) has a number of input files and a number of output files, which you download. You are supposed to create 2 programs that can generate the appropriate output file(s) from the relevant input file(s) according to the assignments. There should be no confusion about what you have to generate as you have the correct output available. When done or time is up, you hand in (upload) your programs.&lt;br /&gt;
&lt;br /&gt;
If you manage to reproduce output files which are binary identical to the output files from the assignment, then you have automatically passed, because you have demonstrably achieved a major course goal.&amp;lt;br&amp;gt;&lt;br /&gt;
If you did not reproduce the output files, then your code will be inspected and evaluated by the teacher. If it is meaningful and sensible (see learning objectives) you will pass. Reproducing one of the output files, i.e. solving one of the assignments is not enough for a pass, but it is a good start.&lt;br /&gt;
&lt;br /&gt;
==== How to see if files are binary identical? ====&lt;br /&gt;
In Unix (you are using Unix, right? The course is conducted with that in mind) you can do a (md5) checksum.&lt;br /&gt;
 md5sum file1 file2&lt;br /&gt;
If the checksums are the same, the files are identical.&amp;lt;br&amp;gt;&lt;br /&gt;
You can also do a diff.&lt;br /&gt;
 diff file1 file2&lt;br /&gt;
If they are identical, nothing happens. If not, you will be informed about the differences - in a confusing way until you learn to decipher it.&lt;br /&gt;
&lt;br /&gt;
If you are uncertain about this, maybe you should practice a few times &#039;&#039;before&#039;&#039; the exam.&lt;br /&gt;
You can also google for it, because there are many ways of doing this.&lt;br /&gt;
&lt;br /&gt;
=== The project work ===&lt;br /&gt;
In the project all aids are allowed, like books, powerpoints, google, teachers, or knowledge sources. Make sure you understand what you are told/read. Uncritical copy/paste is not going to help you or the project. Make references when relevant.&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;&#039;&#039;&#039;All aids&#039;&#039;&#039;&amp;quot; does not cover what normally constitutes cheating, like copying other students work or copying from solutions found on the internet or using libraries that solves the entire problem for you.&lt;br /&gt;
&lt;br /&gt;
The teacher will grade the project work. It consists of 3 parts. The report, the code and the peer evaluation.&amp;lt;br&amp;gt;&lt;br /&gt;
The project work addresses the following &#039;&#039;&#039;learning objectives&#039;&#039;&#039;, some only partly:&lt;br /&gt;
* Use the command line of Unix with 10-15 common Unix commands, inclusive file system navigation, pipelines, process and file system control.&lt;br /&gt;
* Demonstrate and explain the python syntax, object mode, data structures, classes and 65-70 Python methods/functions.&lt;br /&gt;
* Exercise pattern recognition in (bioinformatic) data files in order to extract information.&lt;br /&gt;
* Apply methods/programming techniques demonstrated in the course to similar problems.&lt;br /&gt;
* Analyze a (programming) problem and ascertain its components, and create an efficient solution by applying the right components in the right order.&lt;br /&gt;
* Analyze a program and based on its behavior, locate and eradicate errors.&lt;br /&gt;
* Evaluate the quality of the code, based on criteria shown in the course, and ensuring the code meets quality standards by employing the unit test technique.&lt;br /&gt;
* Write clear, precise and well documented code, which is suitable for greater collaborative efforts.&lt;br /&gt;
* Evaluate the performance and efficiency of code with respect to speed and memory consumption using Big O notation.&lt;br /&gt;
* Utilize code libraries, both scientific and other, for fast and good solution of programming tasks.&lt;br /&gt;
The project will be evaluated on a overall view of how well it answers the problems stated in the project description. Below are various elements specified.&lt;br /&gt;
&lt;br /&gt;
==== General engineering competences relevant for the project work ====&lt;br /&gt;
Ability to write coherent text, to form proper sentences in English, to finish a consistent quality product (head lines, TOC, no missing words or figures, using same words for concepts, using the right scientific word/concept, etc.). Ability to collaborate both in writing text and programming code with other people.&lt;br /&gt;
&lt;br /&gt;
==== Skills and competences relevant for the report, which will form the foundation of the evaluation ====&lt;br /&gt;
* Analyze the project formulation and develop a method/algorithm for solving the problem.&lt;br /&gt;
* Clearly describe theoretical aspects of the project and/or algorithm; this could be mathematical foundation, input file data formats, categories used, prerequisites and assumptions, data structure elements or clever ideas.&lt;br /&gt;
* Clearly and coherently describe the algorithm and relevant data structures.&lt;br /&gt;
* When designing an algorithm, it can suffer from performance problems, which can be related to both speed and excessive use of memory. Can you recognize the shortcomings in your own algorithm? &lt;br /&gt;
* Show the proper use of pseudo code.&lt;br /&gt;
* Appropriate use of figures, graphs, illustrations and screenshots.&lt;br /&gt;
* Demonstrate the actual structure of the finished program, so the reader can understand what is going on where when seeing the code.&lt;br /&gt;
* Evaluate what is trivial and what is not, and focus on explaining the non-trivial things in the program.&lt;br /&gt;
* Understand and clearly describe a runtime evaluation of your code in Big O notation.&lt;br /&gt;
* Present results from the project in a relevant and clear fashion.&lt;br /&gt;
* See the perspective of your project. Where can it be used? What can be improved? What did you learn?&lt;br /&gt;
* Building a project is a creative process. Let that creativity shine through.&lt;br /&gt;
&lt;br /&gt;
==== Skills and competences relevant for the code, which will form the foundation of the evaluation ====&lt;br /&gt;
During the course you have been presented for many skills and competences in the peer evaluation and through that gained a good grasp of the easier coding skills.&lt;br /&gt;
* Correct commenting in code, placement and content&lt;br /&gt;
* Spacing and modularization&lt;br /&gt;
* Object naming&lt;br /&gt;
* Using appropriate amount and type of variables for the task.&lt;br /&gt;
* Error handling &lt;br /&gt;
The project demands more of the in-depth coding skills.&lt;br /&gt;
* Avoiding simple structural flaws&lt;br /&gt;
* Ability to write clear and precise code.&lt;br /&gt;
* Writing concise code.&lt;br /&gt;
* Writing sensible and meaningful code. This is not included by above points.&lt;br /&gt;
* This point is about significant performance problems not related to an inherent problem with the algorithm (which is covered in the report).&lt;br /&gt;
# Recognizing and avoiding unnecessary loops or methods which slow the speed of the program. &lt;br /&gt;
# Recognizing and avoiding unnecessary use of computer memory.&lt;br /&gt;
* Reach for beauty and elegance. That happens when it all comes together in a clear and obvious chain of events that leads to the result.&lt;br /&gt;
&lt;br /&gt;
==== Skills and competences relevant for the peer evaluation ====&lt;br /&gt;
* You will use many of the competences used in your own report and code.&lt;br /&gt;
* Ability to critically read and evaluate text and foreign code (see through bullshit).&lt;br /&gt;
&lt;br /&gt;
== Failing the course ==&lt;br /&gt;
The course has two parts, which must both be passed in order to pass the course: Project and exam.&amp;lt;br&amp;gt;&lt;br /&gt;
There are also the weekly exercises and peer evaluations, which are mandatory to hand in for you to be allowed to take the exam.&lt;br /&gt;
They do not count in passing the course, but consistently doing poorly in exercises is not going to aid&lt;br /&gt;
the project or the exam.&lt;br /&gt;
=== Failing the exam ===&lt;br /&gt;
This is the most frequent way of failing the course. Said in simple words, the exam tests your ability - when faced with a problem - to produce meaningful code that solves the problem. As can be seen by the text above, it is not strictly required that the code works. That is just an easy way to measure ability. What &#039;&#039;&#039;is&#039;&#039;&#039; required, is that the code is meaningful given the problem. Half done, or explaining what you intend is not sufficient.&lt;br /&gt;
&lt;br /&gt;
The 4 challenges for failing students are:&lt;br /&gt;
* difficulty in analyzing the problem - understanding what has to be done.&lt;br /&gt;
* difficulty in formulating a strategy for solving the problem - designing the method/algorithm.&lt;br /&gt;
* difficulty in executing the strategy - transforming the method/idea into working code.&lt;br /&gt;
* difficulty in making the code coherent - having the grand overview of the method/code.&lt;br /&gt;
&lt;br /&gt;
There are lots of mistakes a student can do at an exam; insufficient knowledge of python (syntax and functions), bad performance, repetitive code, missing variables, wrongly iterating loops, missing the right position of sequences etc., but they are not disastrous on the individual level - although they reflect your skill. The problem is when many mistakes/omissions come together and form a mix of the 4 challenges.&lt;br /&gt;
&lt;br /&gt;
If you have not improved on these 4 points when you are taking an re-exam, the result is quite predictable.&amp;lt;br&amp;gt;&lt;br /&gt;
How to improve: Practice, practice, and then some practice. You can not read it in a book or powerpoint, or see the videos again. That helps in knowing the syntax and functions of python, but that is not the issue. The issue is training your brain in the analyzing, formulating and executing phases of coherent programming.&lt;br /&gt;
&lt;br /&gt;
=== Failing the project ===&lt;br /&gt;
The reasons for failing the project are usually:&lt;br /&gt;
* more than one severe, game changing mistake has been made in the code.&lt;br /&gt;
* severe misunderstandings about the content/purpose of the project.&lt;br /&gt;
* too low quality in code and report.&lt;br /&gt;
The first two could somewhat easily be avoided by consulting the teacher. The last... you need to up your game; Take the time to write a decent report - there are resources available on how to do that. The code can/must be improved by learning more during the course - study the exercises and the solutions. A lot of the same reasons for failing the exam also apply here.&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Aligning_expectations&amp;diff=83</id>
		<title>Aligning expectations</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Aligning_expectations&amp;diff=83"/>
		<updated>2024-04-22T16:56:07Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* The report itself */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== What is expected from you ==&lt;br /&gt;
=== Fulfilling prerequisites ===&lt;br /&gt;
A course like [https://kurser.dtu.dk/course/22101 22101/22161 Introduction to programming in Life Science using Python] should enable you for this course.&amp;lt;br&amp;gt;&lt;br /&gt;
Generally speaking, you must know simple Python well, which means you know the basic syntactical structure of Python (assignment, expressions, if, for while, some functions/methods), some data types (integer, float, string, lists, sets, dicts), and trivial file reading and writing, such that you relatively easy can solve minor programming tasks without any use of libraries. You can check if your abilities are up to par by solving some of the exercises in 22101/22161 above.&amp;lt;br&amp;gt;&lt;br /&gt;
You must have your own computer (Windows, Mac, Linux) and you must understand it&#039;s file system structure - the folder hierarchy, file types, and file organization.&lt;br /&gt;
&lt;br /&gt;
=== Special expectations ===&lt;br /&gt;
In the first week you learn Unix. You must work in this environment for the rest of the course. Unix is used in a number of (bioinformatic) courses, and being able to navigate in  Unix is not only a survival skill, but also a skill sought in industry. All major bioinformatic efforts take place on big Unix servers/clusters.&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In hand-ins you follow the skills taught in 22101, specifically how to write comments, using spacing to modularize the code, proper variable (object) naming, proper use of variables, error handling and code clarity, see [[Code construction]].&lt;br /&gt;
&lt;br /&gt;
=== Standard expectations ===&lt;br /&gt;
* You follow the course every week and hand in the required weekly exercises on DTU Learn.&lt;br /&gt;
* You peer-evaluate every week on DTU Learn. A hand-in is required for evaluation to be allowed.&lt;br /&gt;
* You do a project with a peer (i.e. a two person group project) at the end of the course.&lt;br /&gt;
* When getting help from TA&#039;s and teacher, understand that many students need help. You can not expect us to sit an hour with you. If time actually allows for it, we do not mind doing it.&lt;br /&gt;
&lt;br /&gt;
=== Knowing the roles at the university ===&lt;br /&gt;
By knowing the roles of the various entities, you know where to direct your question.&lt;br /&gt;
* &#039;&#039;&#039;The teacher&#039;&#039;&#039; is responsible for the content of the course, the curriculum, the teaching, the project evaluation, the making and evaluation of the exam, and the reporting of cheating. In short - any content.&lt;br /&gt;
* &#039;&#039;&#039;The study office&#039;&#039;&#039; - and by extension - the exam office, the study guidance, and the legal office (cheating) deal with everything that is not course content. A lot of information is available on [https://www.inside.dtu.dk/en/undervisning DTU Inside] among that exam dates and what to do when having problems with the studies. They also process and publish the grades handed in by the teacher.&lt;br /&gt;
&lt;br /&gt;
== Course structure ==&lt;br /&gt;
The course is week-based, meaning new subjects are introduced Monday, and we work with them during the week. Sometimes smaller subjects are introduced Thursdays.&lt;br /&gt;
In the last month of the course a programming project will be made in 2-person groups.&lt;br /&gt;
&lt;br /&gt;
=== Exercises ===&lt;br /&gt;
* Weekly exercises are given every Monday. This constitutes an exercise set. There will be 12 of those. Exercises are mandatory.&lt;br /&gt;
* Exercises have to be uploaded to [https://learn.dtu.dk DTU Learn] latest Sunday in the same week as the exercises were given.&lt;br /&gt;
* Peer evaluation of exercises are done in the following week on [https://learn.dtu.dk DTU Learn] to be handed in Friday. The evaluations are mandatory.&lt;br /&gt;
* At least 10 of 12 evaluations must be handed in on DTU Learn for you to be allowed to take the exam. You can only evaluate if you have handed in exercises.&lt;br /&gt;
* Word or pdf documents are NOT accepted as hand-in - use only simple &#039;&#039;&#039;.txt&#039;&#039;&#039; or &#039;&#039;&#039;.py&#039;&#039;&#039; files.&amp;lt;br&amp;gt;&lt;br /&gt;
* Solutions to each week&#039;s exercises are published before the next week&#039;s lesson on DTU Learn (under Discussions). Can be used as reference for peer evaluation.&lt;br /&gt;
* Exercises which are handed in after the solutions are published, are voided and will not count, no matter the reason for being late.&lt;br /&gt;
* Do not read ahead and start using functions/methods/libraries which will be covered later in the course.&lt;br /&gt;
* &amp;lt;font color=&amp;quot;AA00FF&amp;quot;&amp;gt;&#039;&#039;&#039;Purple exercises&#039;&#039;&#039; has to be done in pseudo code before you start implementing them in Python. The pseudo code is part of the hand-in for these exercises.&amp;lt;br&amp;gt;So - make the pseudocode FIRST, then the real python programs AFTERWARDS for the purple exercises.&amp;lt;/font&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Why &amp;amp; How to peer-evaluate exercises ===&lt;br /&gt;
The peer evaluation is a central part of the learning process, using formative feedback. You can get input from your peers on how you can solve an exercise. &amp;lt;!--The evaluation scheme has a lot of targeted questions about various aspects of the code you will have to evaluate, which will help you understand how you should develop your programs. Initially, it can seem overwhelming, but since the questions repeat week after week, and you mostly have to check boxes, it will get easier during the course.--&amp;gt; You learn both by doing the evaluation and by receiving (reading) it.&lt;br /&gt;
&lt;br /&gt;
You must use the criteria in [[Code construction]] in your own programs and the evaluation.&lt;br /&gt;
&lt;br /&gt;
=== Projects, general information ===&lt;br /&gt;
* A project is done with two students in each group, no exceptions unless there is an odd number of students, in which case a one (wo)man group is formed. The teacher will form the groups, but will accept already formed groups, if informed in time, see the google sheet found via the announcement at DTU Learn.&lt;br /&gt;
* Each group will choose a project from the [[Project list]] in the last part of the course. If a group has an idea for a different project, this must be discussed with the teacher to see if it is feasible. Such a &amp;quot;home made&amp;quot; project must be of sufficient complexity, but not too complex either, and have clear goals, so it can be measured if it was failed or not.&lt;br /&gt;
* A project is estimated to be doable in 50 hours of work - people often use more time.&lt;br /&gt;
* The project work consists of two phases; 1) Doing a project - making code and writing report. 2) Individual (not in the group) peer evaluation of another groups project. Thus every project is peer evaluated twice. Both phases are mandatory.&lt;br /&gt;
* The teacher will evaluate all projects and peer evaluations. Understand that YOUR evaluation of another groups project is part of YOUR project work, and as such it will be part of YOUR grade.&lt;br /&gt;
* The project work will count for 50% of the final grade. The project grade is thus combined from the group effort of doing a project and the individual effort of peer grading a project.&lt;br /&gt;
* The project will be handed in through DTU Learn at the time written in the [[Programme]].&lt;br /&gt;
* The teacher will distribute the projects per mail for peer evaluation with the intention of NOT evaluation the same type of project you made. &lt;br /&gt;
* The evaluation of another groups project will be handed in 1 week later on DTU Learn.&lt;br /&gt;
&lt;br /&gt;
=== Getting help with the project ===&lt;br /&gt;
* The groups can consult the teacher and the TA&#039;s on problems with the projects. The teacher has the best overview, while TA&#039;s often can only help with actual problems in the code.&lt;br /&gt;
* Groups can talk to each other on their project, but actual cooperation between groups with the same project is not allowed. Here is a simple and clear rule:&amp;lt;br&amp;gt;&#039;&#039;&#039;Groups must not show any written material (typically code or report) to any other group&#039;&#039;&#039;.&lt;br /&gt;
* Consulting Google on programming issues is fine, but understand what is being said/written and why it works.&lt;br /&gt;
* Nothing but Python libraries which have been taught in the course can be used in projects. Consult the teacher if in doubt.&lt;br /&gt;
* The teacher should be informed if a group is dysfunctional. We will work something out.&lt;br /&gt;
&lt;br /&gt;
=== Content of the project ===&lt;br /&gt;
Each project consists of:&lt;br /&gt;
* A report, preferably in PDF. &lt;br /&gt;
* The program code.&lt;br /&gt;
* Any data files of relevance &#039;&#039;&#039;not&#039;&#039;&#039; supplied through the course.&lt;br /&gt;
* A signed version of this [https://teaching.healthtech.dtu.dk/material/22113/ProjectStatement22113.pdf statement]. &amp;lt;div style=&amp;quot;color:red;display:inline;&amp;quot;&amp;gt;DO NOT FORGET&amp;lt;/div&amp;gt; It is unfortunate when students fail because they forget a mandatory element.&lt;br /&gt;
&lt;br /&gt;
==== The report itself ====&lt;br /&gt;
The report should be written is such a way that it can be understood by your peer (fellow student), who have no knowledge of the specifics of your project. I have provided some sources of help, for writing reports, etc. at university level.&amp;lt;br&amp;gt;&lt;br /&gt;
* PDF: [https://teaching.healthtech.dtu.dk/material/22113/Vejledning_i_opbygning_og_skrivning_af_rapporter_v.2013.3.pdf Vejledning i opbygning og skrivning af rapporter]&lt;br /&gt;
* PDF: [https://teaching.healthtech.dtu.dk/material/22113/how_to_write_reports.pdf Guide on how to write a report]&lt;br /&gt;
* Online: [https://www.elsevier.com/connect/11-steps-to-structuring-a-science-paper-editors-will-take-seriously How to structure a science paper]&lt;br /&gt;
* PDF: [https://www.imm.dtu.dk/~janba/MastersThesisAdvice.pdf Master Thesis Advice]&lt;br /&gt;
* Online: [https://thereader.mitpress.mit.edu/umberto-eco-how-to-write-a-thesis/ How to write a thesis]&lt;br /&gt;
&lt;br /&gt;
It is considered unlikely that that the report is less than 4 pages, but there is no set limitation.&lt;br /&gt;
The report is evaluated by quality, not by length. Some projects are naturally heavy in theory,&lt;br /&gt;
others have a more practical approach, and the report is expected to reflect that to some extent.&lt;br /&gt;
&lt;br /&gt;
The following sections can/should be in the report in approximately this order:&amp;lt;br&amp;gt;&lt;br /&gt;
* Introduction - mandatory&lt;br /&gt;
* Contribution - mandatory&lt;br /&gt;
* Theory - mandatory&lt;br /&gt;
* Algorithm design - mandatory&lt;br /&gt;
* Program design - mandatory&lt;br /&gt;
* Program manual - mandatory&lt;br /&gt;
* Results - optional, depends a bit on the project if it is natural to include&lt;br /&gt;
* Runtime analysis - mandatory&lt;br /&gt;
* (Unit) Testing - mandatory&lt;br /&gt;
* Conclusion - mandatory&lt;br /&gt;
* References - mandatory if any references&lt;br /&gt;
&lt;br /&gt;
The sections are explained in detail below. It is important that the report reads naturally with easy flow from one subject to the next. It should also be a coherent logical structure. As an example you must explain a concept/system/method before you use it - not use it first and later tell what it means. Nor should you write the same thing twice. The report must cover all sections in some form, but as the projects are different, different emphasis will be placed on the different sections. An example is that the Theory section is important for project 9-11, but much less important and &amp;quot;theoretical&amp;quot; for project 1,3,4,5. Results are really important in project 5 and 6, but much less in the rest. The sections on &amp;quot;Theory&amp;quot;, &amp;quot;Algorithm design&amp;quot; and &amp;quot;Program design&amp;quot; can sometimes content-wise overlap each other depending on the nature of the project - it can be a question of simply drawing a line.&lt;br /&gt;
&lt;br /&gt;
==== The code ====&lt;br /&gt;
* The program code as &#039;&#039;&#039;.py&#039;&#039;&#039; or &#039;&#039;&#039;.txt&#039;&#039;&#039; files. It is a separate file from the report.&lt;br /&gt;
* The code should be clearly structured and well commented so it is possible to follow your thinking.&lt;br /&gt;
* The major data structures should also be explained with structure and purpose.&lt;br /&gt;
* The code should obviously follow the guidelines that has been given during the course in exercises.&lt;br /&gt;
&lt;br /&gt;
==== Report sections ====&lt;br /&gt;
* &#039;&#039;&#039;Introduction&#039;&#039;&#039; - A short section explaining why your project and program is important and useful and in which context it should be used. Can also be used to introduce some background.&lt;br /&gt;
* &#039;&#039;&#039;Contribution&#039;&#039;&#039; - The should be made clear, who contributed to which parts of the project, if the contributions are uneven or clearly split up. For a group that worked closely together (the best), it is completely fine to write &amp;quot;equal contribution&amp;quot; from all members, if such is the case and it is not clear who has the main contribution. &amp;quot;Equal contribution&amp;quot; is also fine for situations where the main author of pieces of code and sections of reports changes in the group such that both members have roughly contributed evenly on both programming and writing. Caveat: If you write &amp;quot;equal contribution&amp;quot; you will pass or fail together.&lt;br /&gt;
* &#039;&#039;&#039;Theory&#039;&#039;&#039; - If you have math, equations, systems, ideas that lies behind the code you do, then it should be described and explained here. Any &amp;quot;facts&amp;quot; that you are using in the code or programming against should be described in this section. High-level decisions you make, that affects how the code will work, could be described - could be using a specific library.&lt;br /&gt;
* &#039;&#039;&#039;Algorithm design&#039;&#039;&#039; -  Every project has at least one &amp;quot;core&amp;quot; algorithm or method that the project revolves about. In this section you explain &#039;&#039;how&#039;&#039; the core algorithm(s) works. Pseudo code is great for making a structure you can explain it from. Some people use diagrams.&lt;br /&gt;
* &#039;&#039;&#039;Program design&#039;&#039;&#039; - This is where you give an overview of your program: &#039;&#039;where&#039;&#039; are your functions, &#039;&#039;where&#039;&#039; is your input and output, &#039;&#039;where&#039;&#039; is the main core. To show the structural overview of your program, pseudo code is again a great method.&lt;br /&gt;
* &#039;&#039;&#039;Program manual&#039;&#039;&#039; - Describe how to use the program(s), input format, various program options, expected output, example runs. Some screenshots works great here.&lt;br /&gt;
* &#039;&#039;&#039;Results&#039;&#039;&#039; -  Show/describe/list your results of the program runs in this section.&lt;br /&gt;
* &#039;&#039;&#039;Runtime analysis&#039;&#039;&#039; -  In this section you analyze the performance of the program in Big O terms, see [[Runtime evaluation of algorithms]]. You must present calculations supported by arguments, not just results.&lt;br /&gt;
* &#039;&#039;&#039;Conclusion&#039;&#039;&#039; -  Talk about the results you achieved, if you have not done so already. Discuss strength and weaknesses of the program/algorithm. Future goals or improvements.&lt;br /&gt;
* &#039;&#039;&#039;References&#039;&#039;&#039; - If you used articles, web resources, or other information sources during the project, these need to be stated here.&lt;br /&gt;
Some people thinks the algorithm design overlaps a lot with program design, and while there is some truth to that, then algorithm design is detailed about the central algorithm/part, while program design gives an overview of the code, so a reader can easily find the section of interest.&lt;br /&gt;
&lt;br /&gt;
=== How to evaluate the project ===&lt;br /&gt;
The projects consists of two very different parts: The report and the program/code. They must be evaluated in different ways, but both evaluations must result in some written text.&amp;lt;br&amp;gt;&lt;br /&gt;
Here is a [https://teaching.healthtech.dtu.dk/material/22113/projectevaltemplate.docx template evaluation document], which you can use if you wish.&lt;br /&gt;
==== The report ====&lt;br /&gt;
The report should be evaluated on form and content. It should be understood that the report must be well structured and argued, so somebody not familiar with the subject can understand it.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Form:&#039;&#039;&#039; This consists of spelling, meaningful sentences, good language. Importantly, the sections as described above must be present. The sections: &amp;quot;Theory&amp;quot;, &amp;quot;Algorithm design&amp;quot; and &amp;quot;Program design&amp;quot; should be appropriate for the project type and some flexibility is allowed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Content:&#039;&#039;&#039; The content should obviously be correct and be evaluated in the light of the descriptions of the different report sections above.&lt;br /&gt;
&lt;br /&gt;
==== The program(s) ====&lt;br /&gt;
The code is evaluated on our usual criteria for exercises as can be seen on [[Code construction]] and in peer evaluation for the exercises. I will shortly mention correctness, structure, readability as the main focus.&lt;br /&gt;
&lt;br /&gt;
== Passing the course ==&lt;br /&gt;
The grade in the course is pass/fail. There are two parts which (according to DTU rules) both have to be passed in order to pass the course.&lt;br /&gt;
# The 4 hour written exam with all aids allowed. Weight 50%.&lt;br /&gt;
# The project work (project + peer evaluation). Weight 50%.&lt;br /&gt;
Furthermore, the weekly exercises + peer evaluations are mandatory. At least 10 out of 12 exercise sets + peer evaluations must be handed in to be allowed in exam participation.&amp;lt;br&amp;gt;&lt;br /&gt;
If you fail the exam, but pass the project, you only have to do a re-exam and vice versa. Failing both is a do-over.&lt;br /&gt;
&lt;br /&gt;
=== Conduction of the exam ===&lt;br /&gt;
* The course is using [https://eksamen.dtu.dk/ Digital Exam].&lt;br /&gt;
* You must use your own computer for the exam. According to DTU rules, any technical problems with your computer during the exam is your responsibility and you will get no extra time or leniency if it breaks. You will be evaluated on what you (not) hand in.&lt;br /&gt;
* Internal censoring is used, meaning the teacher and a qualified colleague will grade the exam as pass/non-pass.&lt;br /&gt;
* The exam is with all aids allowed, but there will NOT be open internet. This is following the standard DTU exam template. All aids covers written material like course powerpoints, books and exercises.&lt;br /&gt;
==== Exam purpose ====&lt;br /&gt;
The exam tests your ability in practical Python and programmatic problem solving.&lt;br /&gt;
&lt;br /&gt;
To say it plainly: Can you create Python code that can solve a minor computational problem? That is a major goal of the course.&lt;br /&gt;
&lt;br /&gt;
In particular, the exam addresses the following &#039;&#039;&#039;learning objectives&#039;&#039;&#039;, some only partly:&amp;lt;br&amp;gt;&lt;br /&gt;
* Demonstrate and explain the python syntax, object mode, data structures, classes and 65-70 Python methods/functions.&lt;br /&gt;
* Exercise pattern recognition in (bioinformatic) data files in order to extract information.&lt;br /&gt;
* Apply methods/programming techniques demonstrated in the course to similar problems.&lt;br /&gt;
* Analyze a (programming) problem and ascertain its components, and create an efficient solution by applying the right components in the right order.&lt;br /&gt;
* Analyze a program and based on its behavior, locate and eradicate errors.&lt;br /&gt;
* Write clear, precise and well documented code, which is suitable for greater collaborative efforts.&lt;br /&gt;
* Utilize code libraries, both scientific and other, for fast and good solution of programming tasks.&lt;br /&gt;
&lt;br /&gt;
==== What actually happens at exam ====&lt;br /&gt;
You will get a pdf with 2 Python programming assignments. Each assignment (which tells you what to do) has a number of input files and a number of output files, which you download. You are supposed to create 2 programs that can generate the appropriate output file(s) from the relevant input file(s) according to the assignments. There should be no confusion about what you have to generate as you have the correct output available. When done or time is up, you hand in (upload) your programs.&lt;br /&gt;
&lt;br /&gt;
If you manage to reproduce output files which are binary identical to the output files from the assignment, then you have automatically passed, because you have demonstrably achieved a major course goal.&amp;lt;br&amp;gt;&lt;br /&gt;
If you did not reproduce the output files, then your code will be inspected and evaluated by the teacher. If it is meaningful and sensible (see learning objectives) you will pass. Reproducing one of the output files, i.e. solving one of the assignments is not enough for a pass, but it is a good start.&lt;br /&gt;
&lt;br /&gt;
==== How to see if files are binary identical? ====&lt;br /&gt;
In Unix (you are using Unix, right? The course is conducted with that in mind) you can do a (md5) checksum.&lt;br /&gt;
 md5sum file1 file2&lt;br /&gt;
If the checksums are the same, the files are identical.&amp;lt;br&amp;gt;&lt;br /&gt;
You can also do a diff.&lt;br /&gt;
 diff file1 file2&lt;br /&gt;
If they are identical, nothing happens. If not, you will be informed about the differences - in a confusing way until you learn to decipher it.&lt;br /&gt;
&lt;br /&gt;
If you are uncertain about this, maybe you should practice a few times &#039;&#039;before&#039;&#039; the exam.&lt;br /&gt;
You can also google for it, because there are many ways of doing this.&lt;br /&gt;
&lt;br /&gt;
=== The project work ===&lt;br /&gt;
In the project all aids are allowed, like books, powerpoints, google, teachers, or knowledge sources. Make sure you understand what you are told/read. Uncritical copy/paste is not going to help you or the project. Make references when relevant.&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;quot;&#039;&#039;&#039;All aids&#039;&#039;&#039;&amp;quot; does not cover what normally constitutes cheating, like copying other students work or copying from solutions found on the internet or using libraries that solves the entire problem for you.&lt;br /&gt;
&lt;br /&gt;
The teacher will grade the project work. It consists of 3 parts. The report, the code and the peer evaluation.&amp;lt;br&amp;gt;&lt;br /&gt;
The project work addresses the following &#039;&#039;&#039;learning objectives&#039;&#039;&#039;, some only partly:&lt;br /&gt;
* Use the command line of Unix with 10-15 common Unix commands, inclusive file system navigation, pipelines, process and file system control.&lt;br /&gt;
* Demonstrate and explain the python syntax, object mode, data structures, classes and 65-70 Python methods/functions.&lt;br /&gt;
* Exercise pattern recognition in (bioinformatic) data files in order to extract information.&lt;br /&gt;
* Apply methods/programming techniques demonstrated in the course to similar problems.&lt;br /&gt;
* Analyze a (programming) problem and ascertain its components, and create an efficient solution by applying the right components in the right order.&lt;br /&gt;
* Analyze a program and based on its behavior, locate and eradicate errors.&lt;br /&gt;
* Evaluate the quality of the code, based on criteria shown in the course, and ensuring the code meets quality standards by employing the unit test technique.&lt;br /&gt;
* Write clear, precise and well documented code, which is suitable for greater collaborative efforts.&lt;br /&gt;
* Evaluate the performance and efficiency of code with respect to speed and memory consumption using Big O notation.&lt;br /&gt;
* Utilize code libraries, both scientific and other, for fast and good solution of programming tasks.&lt;br /&gt;
The project will be evaluated on a overall view of how well it answers the problems stated in the project description. Below are various elements specified.&lt;br /&gt;
&lt;br /&gt;
==== General engineering competences relevant for the project work ====&lt;br /&gt;
Ability to write coherent text, to form proper sentences in English, to finish a consistent quality product (head lines, TOC, no missing words or figures, using same words for concepts, using the right scientific word/concept, etc.). Ability to collaborate both in writing text and programming code with other people.&lt;br /&gt;
&lt;br /&gt;
==== Skills and competences relevant for the report, which will form the foundation of the evaluation ====&lt;br /&gt;
* Analyze the project formulation and develop a method/algorithm for solving the problem.&lt;br /&gt;
* Clearly describe theoretical aspects of the project and/or algorithm; this could be mathematical foundation, input file data formats, categories used, prerequisites and assumptions, data structure elements or clever ideas.&lt;br /&gt;
* Clearly and coherently describe the algorithm and relevant data structures.&lt;br /&gt;
* When designing an algorithm, it can suffer from performance problems, which can be related to both speed and excessive use of memory. Can you recognize the shortcomings in your own algorithm? &lt;br /&gt;
* Show the proper use of pseudo code.&lt;br /&gt;
* Appropriate use of figures, graphs, illustrations and screenshots.&lt;br /&gt;
* Demonstrate the actual structure of the finished program, so the reader can understand what is going on where when seeing the code.&lt;br /&gt;
* Evaluate what is trivial and what is not, and focus on explaining the non-trivial things in the program.&lt;br /&gt;
* Understand and clearly describe a runtime evaluation of your code in Big O notation.&lt;br /&gt;
* Present results from the project in a relevant and clear fashion.&lt;br /&gt;
* See the perspective of your project. Where can it be used? What can be improved? What did you learn?&lt;br /&gt;
* Building a project is a creative process. Let that creativity shine through.&lt;br /&gt;
&lt;br /&gt;
==== Skills and competences relevant for the code, which will form the foundation of the evaluation ====&lt;br /&gt;
During the course you have been presented for many skills and competences in the peer evaluation and through that gained a good grasp of the easier coding skills.&lt;br /&gt;
* Correct commenting in code, placement and content&lt;br /&gt;
* Spacing and modularization&lt;br /&gt;
* Object naming&lt;br /&gt;
* Using appropriate amount and type of variables for the task.&lt;br /&gt;
* Error handling &lt;br /&gt;
The project demands more of the in-depth coding skills.&lt;br /&gt;
* Avoiding simple structural flaws&lt;br /&gt;
* Ability to write clear and precise code.&lt;br /&gt;
* Writing concise code.&lt;br /&gt;
* Writing sensible and meaningful code. This is not included by above points.&lt;br /&gt;
* This point is about significant performance problems not related to an inherent problem with the algorithm (which is covered in the report).&lt;br /&gt;
# Recognizing and avoiding unnecessary loops or methods which slow the speed of the program. &lt;br /&gt;
# Recognizing and avoiding unnecessary use of computer memory.&lt;br /&gt;
* Reach for beauty and elegance. That happens when it all comes together in a clear and obvious chain of events that leads to the result.&lt;br /&gt;
&lt;br /&gt;
==== Skills and competences relevant for the peer evaluation ====&lt;br /&gt;
* You will use many of the competences used in your own report and code.&lt;br /&gt;
* Ability to critically read and evaluate text and foreign code (see through bullshit).&lt;br /&gt;
&lt;br /&gt;
== Failing the course ==&lt;br /&gt;
The course has two parts, which must both be passed in order to pass the course: Project and exam.&amp;lt;br&amp;gt;&lt;br /&gt;
There are also the weekly exercises and peer evaluations, which are mandatory to hand in for you to be allowed to take the exam.&lt;br /&gt;
They do not count in passing the course, but consistently doing poorly in exercises is not going to aid&lt;br /&gt;
the project or the exam.&lt;br /&gt;
=== Failing the exam ===&lt;br /&gt;
This is the most frequent way of failing the course. Said in simple words, the exam tests your ability - when faced with a problem - to produce meaningful code that solves the problem. As can be seen by the text above, it is not strictly required that the code works. That is just an easy way to measure ability. What &#039;&#039;&#039;is&#039;&#039;&#039; required, is that the code is meaningful given the problem. Half done, or explaining what you intend is not sufficient.&lt;br /&gt;
&lt;br /&gt;
The 4 challenges for failing students are:&lt;br /&gt;
* difficulty in analyzing the problem - understanding what has to be done.&lt;br /&gt;
* difficulty in formulating a strategy for solving the problem - designing the method/algorithm.&lt;br /&gt;
* difficulty in executing the strategy - transforming the method/idea into working code.&lt;br /&gt;
* difficulty in making the code coherent - having the grand overview of the method/code.&lt;br /&gt;
&lt;br /&gt;
There are lots of mistakes a student can do at an exam; insufficient knowledge of python (syntax and functions), bad performance, repetitive code, missing variables, wrongly iterating loops, missing the right position of sequences etc., but they are not disastrous on the individual level - although they reflect your skill. The problem is when many mistakes/omissions come together and form a mix of the 4 challenges.&lt;br /&gt;
&lt;br /&gt;
If you have not improved on these 4 points when you are taking an re-exam, the result is quite predictable.&amp;lt;br&amp;gt;&lt;br /&gt;
How to improve: Practice, practice, and then some practice. You can not read it in a book or powerpoint, or see the videos again. That helps in knowing the syntax and functions of python, but that is not the issue. The issue is training your brain in the analyzing, formulating and executing phases of coherent programming.&lt;br /&gt;
&lt;br /&gt;
=== Failing the project ===&lt;br /&gt;
The reasons for failing the project are usually:&lt;br /&gt;
* more than one severe, game changing mistake has been made in the code.&lt;br /&gt;
* severe misunderstandings about the content/purpose of the project.&lt;br /&gt;
* too low quality in code and report.&lt;br /&gt;
The first two could somewhat easily be avoided by consulting the teacher. The last... you need to up your game; Take the time to write a decent report - there are resources available on how to do that. The code can/must be improved by learning more during the course - study the exercises and the solutions. A lot of the same reasons for failing the exam also apply here.&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Scientific_Libraries,_Pandas,_Numpy&amp;diff=82</id>
		<title>Scientific Libraries, Pandas, Numpy</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Scientific_Libraries,_Pandas,_Numpy&amp;diff=82"/>
		<updated>2024-04-11T07:00:59Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Required course material for the lesson */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
{| width=500  style=&amp;quot;font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;&amp;quot;&lt;br /&gt;
|Previous: [[Unit test]]&lt;br /&gt;
|Next: [[Runtime evaluation of algorithms]]&lt;br /&gt;
|}&lt;br /&gt;
== Required course material for the lesson ==&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22113/22113_09-PandasNumpy.pptx Scientific libraries, Pandas &amp;amp; NumPy]&amp;lt;br&amp;gt;&lt;br /&gt;
Online: [https://pandas.pydata.org/docs/user_guide/index.html https://pandas.pydata.org/]Pandas documantation&amp;lt;br&amp;gt;&lt;br /&gt;
Online: [https://numpy.org/doc/stable/ https://numpy.org/] NumPy documentation&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Resource: [[Example code - File Reading]]&amp;lt;br&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Subjects covered ==&lt;br /&gt;
General into to scientific libraries&amp;lt;br&amp;gt;&lt;br /&gt;
Pandas&amp;lt;br&amp;gt;&lt;br /&gt;
NumPy&lt;br /&gt;
&lt;br /&gt;
== Exercises to be handed in ==&lt;br /&gt;
&#039;&#039;Pandas&#039;&#039;&amp;lt;br&amp;gt;&lt;br /&gt;
During this part of the exercise, you will be working with the data that was used to validate the tool ResFinder (https://pubmed.ncbi.nlm.nih.gov/32780112/). In order to do it, different Centers around the world (Denmark, Germany, Belgium, UK and USA) isolated several bacteria species found in clinical and surveillance environments, and searched for antimicrobial resistance in the laboratory and using ResFinder. In the laboratory, the bacteria isolated were subjected to a MIC (Minimum Inhibitory Concentration) testing of different antimicrobials; in other words, how much antimicrobial we have to give to bacteria isolates until they stop growing. If the value of MIC is higher than certain standards, that indicates that that bacteria is resistant to that antimicrobial. Usually, bacteria that should be killed by an antimicrobial but suddenly they are resistant is because they have acquired a gene or mutation that makes them resistant to that substance. ResFinder is a bioinformatic tool that tries to find those genes/mutations on the sequenced DNA of bacteria.&amp;lt;br&amp;gt;&lt;br /&gt;
A big part of the ResFinder tool validation was to receive the reports from the different centers (reports from laboratories and bioinformatic teams) and analyze them together. You will be making this step during this exercise. The data necessary is in the zip file [https://teaching.healthtech.dtu.dk/material/22113/pandas_exercise.zip pandas_exercise.zip].&lt;br /&gt;
&lt;br /&gt;
# Load the metadata files (ending in &#039;&#039;_ids.txt&#039;&#039;) from Belgium, Denmark, Germany, UK and USA, and create a dataframe stacking the five dataframes. The final dataframe should include an extra column indicating which country each sample comes from. Get the amount of samples that come from Surveillance and from Clinical origins with respect the Source (Hint: &#039;&#039;&#039;groupby&#039;&#039;&#039; function is your friend).&lt;br /&gt;
# Do the same you have done in exercise 1 with the lab files (ending in &#039;&#039;lab_results.txt&#039;&#039;) and bioinformatic files (ending in &#039;&#039;bioinf_results.txt&#039;&#039;) for all countries. The columns of the bioinformatic results should be strings or objects; while the lab results should be strings (samples) and floats (the rest of columns). As you might have noticed, USA and UK did not follow the format that we asked. You will have to go from [MIC: &amp;lt;mic_value&amp;gt;] to [&amp;lt;mic_value&amp;gt;], where mic_value is float. &#039;&#039;&#039;UPDATE&#039;&#039;&#039;: Seems like UK also added a sneaky &amp;quot;&amp;lt;&amp;quot;. Replace it with the same method.&lt;br /&gt;
# Join the three dataframes row-wise, using the dataframe IDs as a way of mapping the reads ids (bioinformatic results) and the sample ids (laboratory results). Notice you might lose data on the way; that is fine. Hint: merge or join function is your friend here.&lt;br /&gt;
# Not all the laboratories have performed analysis on all the antimicrobials. Try to get the antimicrobials that USA has not performed analysis on. (Hint: When a cell in a column made of float numbers is empty, pandas uses the value &amp;quot;np.NaN&amp;quot;)&lt;br /&gt;
# Save the final dataset that you got from the last exercise under the name &#039;&#039;resfinder_project.tsv&#039;&#039;. Has to be tab separated, the index should not be included.&lt;br /&gt;
The following exercises should not be started before Thursday.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;Numpy&#039;&#039;&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;ol start=&amp;quot;6&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;You are now going to work with gene expression data. Your employer has given you the results of the analysis from two different machines, but on the same samples. The analysis has been done in ten samples, and 5000 genes have been analyzed. In other words, you have the data from two machines (&#039;&#039;gene_expression1.txt&#039;&#039; and &#039;&#039;gene_expression2.txt&#039;&#039;), with an array each one of 10, 5000 (samples, genes). Read the gene_expression1 file and stored it in an array.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Seems the second machine outputs the results in the format of genes, samples (5000,10). Read the file, stored it in an array and turn it into an array with shape (10, 5000).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Your employer wants to normalize each sample. In other words, you need to subtract the mean of each row (Sample_normalized&amp;lt;sub&amp;gt;n&amp;lt;/sub&amp;gt; = Sample&amp;lt;sub&amp;gt;n&amp;lt;/sub&amp;gt; - Mean_sample&amp;lt;sub&amp;gt;n&amp;lt;/sub&amp;gt;)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Your employer ask you to save both arrays in the same file, firstly stacking them row-wise, and then saving them in a .npy file: &#039;&#039;normalized_array.npy&#039;&#039;.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Exercises for extra practice ==&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Scientific_Libraries,_Pandas,_Numpy&amp;diff=81</id>
		<title>Scientific Libraries, Pandas, Numpy</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Scientific_Libraries,_Pandas,_Numpy&amp;diff=81"/>
		<updated>2024-04-04T14:02:32Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
{| width=500  style=&amp;quot;font-size: 10px; float:right; margin-left: 10px; margin-top: -56px;&amp;quot;&lt;br /&gt;
|Previous: [[Unit test]]&lt;br /&gt;
|Next: [[Runtime evaluation of algorithms]]&lt;br /&gt;
|}&lt;br /&gt;
== Required course material for the lesson ==&lt;br /&gt;
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22113/22113_09-PandasNumoy.pptx Scientific libraries, Pandas &amp;amp; NumPy]&amp;lt;br&amp;gt;&lt;br /&gt;
Online: [https://pandas.pydata.org/docs/user_guide/index.html https://pandas.pydata.org/]Pandas documantation&amp;lt;br&amp;gt;&lt;br /&gt;
Online: [https://numpy.org/doc/stable/ https://numpy.org/] NumPy documentation&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Resource: [[Example code - File Reading]]&amp;lt;br&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Subjects covered ==&lt;br /&gt;
General into to scientific libraries&amp;lt;br&amp;gt;&lt;br /&gt;
Pandas&amp;lt;br&amp;gt;&lt;br /&gt;
NumPy&lt;br /&gt;
&lt;br /&gt;
== Exercises to be handed in ==&lt;br /&gt;
&#039;&#039;Pandas&#039;&#039;&amp;lt;br&amp;gt;&lt;br /&gt;
During this part of the exercise, you will be working with the data that was used to validate the tool ResFinder (https://pubmed.ncbi.nlm.nih.gov/32780112/). In order to do it, different Centers around the world (Denmark, Germany, Belgium, UK and USA) isolated several bacteria species found in clinical and surveillance environments, and searched for antimicrobial resistance in the laboratory and using ResFinder. In the laboratory, the bacteria isolated were subjected to a MIC (Minimum Inhibitory Concentration) testing of different antimicrobials; in other words, how much antimicrobial we have to give to bacteria isolates until they stop growing. If the value of MIC is higher than certain standards, that indicates that that bacteria is resistant to that antimicrobial. Usually, bacteria that should be killed by an antimicrobial but suddenly they are resistant is because they have acquired a gene or mutation that makes them resistant to that substance. ResFinder is a bioinformatic tool that tries to find those genes/mutations on the sequenced DNA of bacteria.&amp;lt;br&amp;gt;&lt;br /&gt;
A big part of the ResFinder tool validation was to receive the reports from the different centers (reports from laboratories and bioinformatic teams) and analyze them together. You will be making this step during this exercise. The data necessary is in the zip file [https://teaching.healthtech.dtu.dk/material/22113/pandas_exercise.zip pandas_exercise.zip].&lt;br /&gt;
&lt;br /&gt;
# Load the metadata files (ending in &#039;&#039;_ids.txt&#039;&#039;) from Belgium, Denmark, Germany, UK and USA, and create a dataframe stacking the five dataframes. The final dataframe should include an extra column indicating which country each sample comes from. Get the amount of samples that come from Surveillance and from Clinical origins with respect the Source (Hint: &#039;&#039;&#039;groupby&#039;&#039;&#039; function is your friend).&lt;br /&gt;
# Do the same you have done in exercise 1 with the lab files (ending in &#039;&#039;lab_results.txt&#039;&#039;) and bioinformatic files (ending in &#039;&#039;bioinf_results.txt&#039;&#039;) for all countries. The columns of the bioinformatic results should be strings or objects; while the lab results should be strings (samples) and floats (the rest of columns). As you might have noticed, USA and UK did not follow the format that we asked. You will have to go from [MIC: &amp;lt;mic_value&amp;gt;] to [&amp;lt;mic_value&amp;gt;], where mic_value is float. &#039;&#039;&#039;UPDATE&#039;&#039;&#039;: Seems like UK also added a sneaky &amp;quot;&amp;lt;&amp;quot;. Replace it with the same method.&lt;br /&gt;
# Join the three dataframes row-wise, using the dataframe IDs as a way of mapping the reads ids (bioinformatic results) and the sample ids (laboratory results). Notice you might lose data on the way; that is fine. Hint: merge or join function is your friend here.&lt;br /&gt;
# Not all the laboratories have performed analysis on all the antimicrobials. Try to get the antimicrobials that USA has not performed analysis on. (Hint: When a cell in a column made of float numbers is empty, pandas uses the value &amp;quot;np.NaN&amp;quot;)&lt;br /&gt;
# Save the final dataset that you got from the last exercise under the name &#039;&#039;resfinder_project.tsv&#039;&#039;. Has to be tab separated, the index should not be included.&lt;br /&gt;
The following exercises should not be started before Thursday.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;Numpy&#039;&#039;&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;ol start=&amp;quot;6&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;You are now going to work with gene expression data. Your employer has given you the results of the analysis from two different machines, but on the same samples. The analysis has been done in ten samples, and 5000 genes have been analyzed. In other words, you have the data from two machines (&#039;&#039;gene_expression1.txt&#039;&#039; and &#039;&#039;gene_expression2.txt&#039;&#039;), with an array each one of 10, 5000 (samples, genes). Read the gene_expression1 file and stored it in an array.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Seems the second machine outputs the results in the format of genes, samples (5000,10). Read the file, stored it in an array and turn it into an array with shape (10, 5000).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Your employer wants to normalize each sample. In other words, you need to subtract the mean of each row (Sample_normalized&amp;lt;sub&amp;gt;n&amp;lt;/sub&amp;gt; = Sample&amp;lt;sub&amp;gt;n&amp;lt;/sub&amp;gt; - Mean_sample&amp;lt;sub&amp;gt;n&amp;lt;/sub&amp;gt;)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Your employer ask you to save both arrays in the same file, firstly stacking them row-wise, and then saving them in a .npy file: &#039;&#039;normalized_array.npy&#039;&#039;.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Exercises for extra practice ==&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=22113/22163_-_Unix_%26_Python_Programming_for_Bioinformaticians&amp;diff=80</id>
		<title>22113/22163 - Unix &amp; Python Programming for Bioinformaticians</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=22113/22163_-_Unix_%26_Python_Programming_for_Bioinformaticians&amp;diff=80"/>
		<updated>2024-04-04T13:45:04Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Resources */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__NOTOC__&lt;br /&gt;
== Prepare for the course ==&lt;br /&gt;
You must read and follow the [[Course preparation]] before the you show up on the first day of the course.&amp;lt;br&amp;gt;&lt;br /&gt;
You are &#039;&#039;&#039;required&#039;&#039;&#039; to read at least the first part of [[Aligning expectations]] when the course starts and whenever you have a question related to the conduction of the course.&amp;lt;br&amp;gt;&lt;br /&gt;
Resources can be good to check out during the course, or when you need something more.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Teacher:&#039;&#039;&#039; [https://www.inside.dtu.dk/da/dtuinside/generelt/telefonbog/person?id=816&amp;amp;cpid=214027&amp;amp;tab=2&amp;amp;qt=dtupublicationquery Peter Wad Sackett], pwsa@dtu.dk&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Language:&#039;&#039;&#039; The course is taught in English.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Tools:&#039;&#039;&#039; There is [[Course preparation]].&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Textbooks:&#039;&#039;&#039; There are no text books for the course. I will make do with powerpoints and references to online resources. You can find the material under the individual lessons in the [[programme]].&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Location:&#039;&#039;&#039; Building 116, aud. 82 &amp;lt;span style=&amp;quot;color:red&amp;quot;&amp;gt;NOTICE THIS LOCATION CHANGE&amp;lt;/span&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Time:&#039;&#039;&#039; Monday 13:00 - 17:00, Thursday 9:00 - 12:00, module F2-A and F2-B.&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Deadlines for project work and exam:&#039;&#039;&#039; See the [[Programme]].&lt;br /&gt;
&lt;br /&gt;
== Course details ==&lt;br /&gt;
There are no plans for streaming the lectures as there already are recorded video lectures for first half of the course. Discord is used for online help and discussion - if necessary.&lt;br /&gt;
&lt;br /&gt;
{|&lt;br /&gt;
| [[Programme]] || Spring 2024&lt;br /&gt;
|-&lt;br /&gt;
| [[Aligning expectations]] || Required reading&lt;br /&gt;
|-&lt;br /&gt;
| [[Code construction]] || Required reading for peer evaluation&lt;br /&gt;
|-&lt;br /&gt;
| [[Project list]] || Of projects to do&lt;br /&gt;
|-&lt;br /&gt;
| [[Mini projects]] || For practicing programming&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
[https://docs.google.com/spreadsheets/d/1wEs2xS-7DmpMvtosweTzYw0Fx3_XCbz4k5cuN5tJVK8/edit?usp=sharing Put yourself on the Get Help list]&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Resources ==&lt;br /&gt;
&#039;&#039;&#039;Unix/Linux&#039;&#039;&#039;&lt;br /&gt;
* Youtube: [https://www.youtube.com/watch?v=HbgzrKJvDRw Linux File System/Structure Explained]&lt;br /&gt;
* Youtube: [https://www.youtube.com/watch?v=wBp0Rb-ZJak The Complete Linux Course: Beginner to Power User]&lt;br /&gt;
* Youtube: [https://www.youtube.com/playlist?list=PLIhvC56v63IJIujb5cyE13oLuyORZpdkL Linux series] by the very entertaining NetworkChuck&lt;br /&gt;
* Online: [http://www.oliverelliott.org/article/computing/tut_unix/ Online tutorial on unix]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Python 3&#039;&#039;&#039;&lt;br /&gt;
* Online: [https://www.coursera.org/learn/python Coursera course: Programming for Everybody] is a beginner course in Python. Everyone who wants to prepare more for course 22113 can start here. The [https://teaching.healthtech.dtu.dk/material/22113/CourseraPythonBook_270.pdf Coursera textbook]&lt;br /&gt;
* Online: [https://pynative.com/ PYnative] Good site for learning about Python. Information, tutorials, exercises and even online editor, all well explained in an accessible way.&lt;br /&gt;
* Youtube: [https://www.youtube.com/watch?v=rfscVS0vtbw Python beginner course]&lt;br /&gt;
* Online: [https://teaching.healthtech.dtu.dk/material/22113/clean_code.html Clean Code] by Lukasz Dynowski. An amazing read that is mandatory. Read it once around lesson 3 and once more around lesson 6.&lt;br /&gt;
* Online: [https://rosalind.info/problems/locations/ Rosalind project] Python exercises at different levels for practicing &lt;br /&gt;
* Book: &#039;&#039;Learning Python&#039;&#039;, 5th ed. by Mark Lutz (O&#039;Reilly) ISBN: 978-1-449-35573-9. This is the best Python book I have read. It covers all the basics and then some. All from the perspective of being a novice programmer. However, it is a brick; big, heavy and unwieldy. If you only want one Python book, then this should be the one. The course will not be taught from this book, but it could be good to have as a Python reference manual.&lt;br /&gt;
* Book: &#039;&#039;Python Crash Course: A Hands-On, Project-Based Introduction to Programming&#039;&#039; by Eric Matthes (No Starch Press) ISBN: 1593276036, 9781593276034. A pretty OK book which leads you into the Python world without too many distracting points and theoretical contemplation.&lt;br /&gt;
* Online: [https://docs.python.org/3/tutorial/ Official Python 3 tutorial]&lt;br /&gt;
* Online: [https://docs.python.org/3/reference/index.html Python 3 reference manual]&lt;br /&gt;
* Online: [https://docs.python.org/3/library/index.html Python 3 standard library]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Biological&#039;&#039;&#039;&lt;br /&gt;
* Info: [[Biological knowledge needed in the course]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Writing reports, articles, thesis at university level&#039;&#039;&#039;&lt;br /&gt;
* PDF: [https://teaching.healthtech.dtu.dk/material/22113/Vejledning_i_opbygning_og_skrivning_af_rapporter_v.2013.3.pdf Vejledning i opbygning og skrivning af rapporter]&lt;br /&gt;
* PDF: [https://teaching.healthtech.dtu.dk/material/22113/how_to_write_reports.pdf Guide on how to write a report]&lt;br /&gt;
* Online: [https://www.elsevier.com/connect/11-steps-to-structuring-a-science-paper-editors-will-take-seriously How to structure a science paper]&lt;br /&gt;
* PDF: [https://www.imm.dtu.dk/~janba/MastersThesisAdvice.pdf Master Thesis Advice]&lt;br /&gt;
* Online: [https://thereader.mitpress.mit.edu/umberto-eco-how-to-write-a-thesis/ How to write a thesis]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Interesting but less teaching oriented material&#039;&#039;&#039;&lt;br /&gt;
* Blog: [https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/ Why rewriting software projects can be bad]&lt;br /&gt;
* Online: [http://ivory.idyll.org/blog/big-data-biology.html Top 12 reasons you know you are a Big Data biologist]&lt;br /&gt;
* Online: [http://lifehacker.com/six-life-lessons-ive-learned-from-programming-1502077380 How programming and your life is similar]&lt;br /&gt;
* Youtube: [http://www.youtube.com/watch?v=nKIu9yen5nc What most schools don&#039;t teach - how to think]&lt;br /&gt;
&lt;br /&gt;
== Archive of old course programmes ==&lt;br /&gt;
[[Programme - Spring 2023]]&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Project_list&amp;diff=79</id>
		<title>Project list</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Project_list&amp;diff=79"/>
		<updated>2024-04-04T13:42:14Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Remember to [https://teaching.healthtech.dtu.dk/22113/index.php/Aligning_expectations#Projects,_general_information read about projects here].&lt;br /&gt;
&lt;br /&gt;
The list is arranged approximately from lowest to highest difficulty.&lt;br /&gt;
# [[Text mining MEDLINE abstracts]]&lt;br /&gt;
# [[K-means clustering]]&lt;br /&gt;
# [[Data mining in NCBI databases]]&lt;br /&gt;
# [[Searching for motifs in sequences]]&lt;br /&gt;
# [[Data analysis]]&lt;br /&gt;
# [[Resistance to antibiotics]]&lt;br /&gt;
# [[k-nearest neighbor (k-NN) continuous variable estimation]]&lt;br /&gt;
# [[Read trimmer for Next-Generation-Sequencing data]]&lt;br /&gt;
# [[QT clustering]]&lt;br /&gt;
# [[Pairwise alignment]]&lt;br /&gt;
# [[Artificial Neural Network]]&lt;br /&gt;
&amp;lt;!-- removed projects&lt;br /&gt;
# [[Smith-Waterman alignment]]&lt;br /&gt;
# [[Needleman-Wunsch alignment]]&lt;br /&gt;
# [[Sudoku]]&lt;br /&gt;
# [[Score sequence data with a PSSM]]&lt;br /&gt;
# [[Random sequence generator]]&lt;br /&gt;
# [[Analysis of sorting]]&lt;br /&gt;
# [[Shortest path in graph]]&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22113/index.php?title=Example_code_-_Unit_test&amp;diff=78</id>
		<title>Example code - Unit test</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22113/index.php?title=Example_code_-_Unit_test&amp;diff=78"/>
		<updated>2024-03-20T10:04:54Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;I decided to make some unit tests for my prime number generator, I used as an example last week, see [[Example code - Classes]]&lt;br /&gt;
I make two files - one file containing the class &#039;&#039;PrimeGenerator.py&#039;&#039; and one file containing the tests, &#039;&#039;test_PrimeGenerator.py&#039;&#039;. The files are supposed to be in the same folder.&lt;br /&gt;
I will not show the code from last week in &#039;&#039;PrimeGenerator.py&#039;&#039;, if you want to see the code, click the above link.&lt;br /&gt;
&amp;lt;p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This is my unit test file. I want to test the following:&lt;br /&gt;
* It can generate different series of primes in a not-ascending order (i.e 10, 20, 15)&lt;br /&gt;
* It can figure out if a number is a prime, especially around 0.&lt;br /&gt;
* It reacts correctly on wrong (nonsense) input&lt;br /&gt;
&#039;&#039;&#039;File:&#039;&#039;&#039; test_PrimeGenerator.py&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
import pytest&lt;br /&gt;
from PrimeGenerator import PrimeGenerator&lt;br /&gt;
&lt;br /&gt;
# Parametric test of the generator part&lt;br /&gt;
@pytest.mark.parametrize(&amp;quot;x, y&amp;quot;, [(10,(2,3,5,7)), (20,(2,3,5,7,11,13,17,19)), (15,(2,3,5,7,11,13))])&lt;br /&gt;
&lt;br /&gt;
def test_generator(x, y):&lt;br /&gt;
    result = tuple(PrimeGenerator(x))&lt;br /&gt;
    assert result == y, &amp;quot;Generator test&amp;quot;&lt;br /&gt;
    &lt;br /&gt;
# Parametric test of the giving illegal data to the generator&lt;br /&gt;
@pytest.mark.parametrize(&amp;quot;x&amp;quot;, [1.3, 6.5, &#039;Cat&#039;, [1,2,3,4], {1,2,3,4}, {1:1, 2:2, 3:3, 4:4}])&lt;br /&gt;
&lt;br /&gt;
def test_generator_illegal(x):&lt;br /&gt;
    with pytest.raises(ValueError):&lt;br /&gt;
        PrimeGenerator(x)&lt;br /&gt;
&lt;br /&gt;
# Parametric test of the isprime part&lt;br /&gt;
@pytest.mark.parametrize(&amp;quot;x, y&amp;quot;, [(-3, False),(-2, False),(0, False),(1, False),(2, True),(3, True),(4, False),(5, True),(100, False),(47, True)])&lt;br /&gt;
&lt;br /&gt;
def test_isprime_compute(x, y):&lt;br /&gt;
    assert PrimeGenerator().isprime(x) == y, &amp;quot;isprime test&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Parametric test of the giving illegal data to isprime&lt;br /&gt;
@pytest.mark.parametrize(&amp;quot;x&amp;quot;, [1.3, 6.5, &#039;Cat&#039;, [1,2,3,4], {1,2,3,4}, {1:1, 2:2, 3:3, 4:4}])&lt;br /&gt;
&lt;br /&gt;
def test_isprime_illegal(x):&lt;br /&gt;
    with pytest.raises(ValueError):&lt;br /&gt;
        PrimeGenerator().isprime(x)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
When running the unit tests with &#039;&#039;&#039;pytest test_PrimeGenerator.py&#039;&#039;&#039;, I got the following problems.&lt;br /&gt;
 FAILED test_PrimeGenerator.py::test_generator_illegal[1.3] - Failed: DID NOT RAISE &amp;lt;class &#039;ValueError&#039;&amp;gt;&lt;br /&gt;
 FAILED test_PrimeGenerator.py::test_generator_illegal[6.5] - Failed: DID NOT RAISE &amp;lt;class &#039;ValueError&#039;&amp;gt;&lt;br /&gt;
 FAILED test_PrimeGenerator.py::test_generator_illegal[x3] - TypeError: int() argument must be a string, a bytes-like ...&lt;br /&gt;
 FAILED test_PrimeGenerator.py::test_generator_illegal[x4] - TypeError: int() argument must be a string, a bytes-like ...&lt;br /&gt;
 FAILED test_PrimeGenerator.py::test_generator_illegal[x5] - TypeError: int() argument must be a string, a bytes-like ...&lt;br /&gt;
 FAILED test_PrimeGenerator.py::test_isprime_illegal[1.3] - Failed: DID NOT RAISE &amp;lt;class &#039;ValueError&#039;&amp;gt;&lt;br /&gt;
 FAILED test_PrimeGenerator.py::test_isprime_illegal[6.5] - Failed: DID NOT RAISE &amp;lt;class &#039;ValueError&#039;&amp;gt;&lt;br /&gt;
 FAILED test_PrimeGenerator.py::test_isprime_illegal[Cat] - TypeError: &#039;&amp;lt;=&#039; not supported between instances of &#039;str&#039; a...&lt;br /&gt;
 FAILED test_PrimeGenerator.py::test_isprime_illegal[x3] - TypeError: &#039;&amp;lt;=&#039; not supported between instances of &#039;list&#039; a...&lt;br /&gt;
 FAILED test_PrimeGenerator.py::test_isprime_illegal[x4] - TypeError: &#039;&amp;lt;=&#039; not supported between instances of &#039;set&#039; an...&lt;br /&gt;
 FAILED test_PrimeGenerator.py::test_isprime_illegal[x5] - TypeError: &#039;&amp;lt;=&#039; not supported between instances of &#039;dict&#039; a...&lt;br /&gt;
Despite my best effort in making the PrimeGenerator class, I overlooked something, not in the core code, but in the validation of input.&lt;br /&gt;
The truth is that I did not think of my test_generator_illegal unit test to begin with, but when I made the test_isprime_illegal and it failed, then I realized the other input validation problem. This just shows that is it valuable to make the unit tests, even with the most stupid and inane test you can think of.&lt;br /&gt;
&lt;br /&gt;
Having fixed my code issues, I here present the new PrimeGenerator code.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/usr/bin/env python3&lt;br /&gt;
# Prime number generator&lt;br /&gt;
&lt;br /&gt;
class PrimeGenerator:&lt;br /&gt;
    # Class varible, known primes in consecutive order, can be extended, but must contain these&lt;br /&gt;
    knownprimes = [2, 3]&lt;br /&gt;
    # Highest tested number for prime&lt;br /&gt;
    highesttested = 3&lt;br /&gt;
&lt;br /&gt;
    # Instatiation&lt;br /&gt;
    def __init__(self, number=None):&lt;br /&gt;
        if number is not None:&lt;br /&gt;
            if not isinstance(number, int):                # New code&lt;br /&gt;
                raise ValueError(&amp;quot;Integer expected&amp;quot;)       # New code&lt;br /&gt;
        self.target = number                &lt;br /&gt;
    &lt;br /&gt;
    # Initializing iteration&lt;br /&gt;
    def __iter__(self):&lt;br /&gt;
        if self.target is None:&lt;br /&gt;
            raise ValueError(&amp;quot;No number specified&amp;quot;)&lt;br /&gt;
        self.pos = 0&lt;br /&gt;
        return self&lt;br /&gt;
    &lt;br /&gt;
    # Find next prime&lt;br /&gt;
    def __next__(self):&lt;br /&gt;
        # Can we use the list of known primes to find the next?&lt;br /&gt;
        if self.pos &amp;lt; len(self.knownprimes):&lt;br /&gt;
            nextprime = self.knownprimes[self.pos]&lt;br /&gt;
            if nextprime &amp;gt;= self.target:&lt;br /&gt;
                raise StopIteration&lt;br /&gt;
            self.pos += 1&lt;br /&gt;
            return nextprime&lt;br /&gt;
        # No, start computing the next prime&lt;br /&gt;
        while self.target &amp;gt; PrimeGenerator.highesttested+1:&lt;br /&gt;
            PrimeGenerator.highesttested += 1&lt;br /&gt;
            if self._isprime(PrimeGenerator.highesttested):&lt;br /&gt;
                self.knownprimes.append(PrimeGenerator.highesttested)&lt;br /&gt;
                self.pos += 1&lt;br /&gt;
                return self.highesttested&lt;br /&gt;
        raise StopIteration&lt;br /&gt;
&lt;br /&gt;
    # Private method for identifying a prime&lt;br /&gt;
    def _isprime(self, number):&lt;br /&gt;
        factor = 0&lt;br /&gt;
        pos = 0&lt;br /&gt;
        while factor*factor &amp;lt;= number:&lt;br /&gt;
            # find next potential factor either in known primes or odd numbers above last known prime&lt;br /&gt;
            if pos &amp;lt; len(self.knownprimes):&lt;br /&gt;
                factor = self.knownprimes[pos]&lt;br /&gt;
                pos += 1&lt;br /&gt;
            else:&lt;br /&gt;
                factor += 2&lt;br /&gt;
            # test if it truly is a factor&lt;br /&gt;
            if number % factor == 0:&lt;br /&gt;
                return False&lt;br /&gt;
        return True&lt;br /&gt;
&lt;br /&gt;
    # It is nice be able to ask if an number is a prime&lt;br /&gt;
    def isprime(self, number=None):&lt;br /&gt;
        if number is None:&lt;br /&gt;
            number = self.target&lt;br /&gt;
        if not isinstance(number, int):                # New code&lt;br /&gt;
            raise ValueError(&amp;quot;Integer expected&amp;quot;)&lt;br /&gt;
        if number in PrimeGenerator.knownprimes:&lt;br /&gt;
            return True&lt;br /&gt;
        if number &amp;lt;= PrimeGenerator.highesttested:&lt;br /&gt;
            return False&lt;br /&gt;
        return self._isprime(number)&lt;br /&gt;
&lt;br /&gt;
if __name__ == &amp;quot;__main__&amp;quot;:&lt;br /&gt;
    # Small testing&lt;br /&gt;
    for i in PrimeGenerator(1000):&lt;br /&gt;
        print(i)&lt;br /&gt;
&lt;br /&gt;
    print(PrimeGenerator().isprime(1000000016531)) # True&lt;br /&gt;
    print(PrimeGenerator().isprime(1000000016521)) # False&lt;br /&gt;
&lt;br /&gt;
    # Big prime, don&#039;t try&lt;br /&gt;
    # 999296950101072104250052714631&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
</feed>