<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://teaching.healthtech.dtu.dk:443/22118/index.php?action=history&amp;feed=atom&amp;title=K-means_clustering</id>
	<title>K-means clustering - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://teaching.healthtech.dtu.dk:443/22118/index.php?action=history&amp;feed=atom&amp;title=K-means_clustering"/>
	<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22118/index.php?title=K-means_clustering&amp;action=history"/>
	<updated>2026-05-02T18:59:05Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22118/index.php?title=K-means_clustering&amp;diff=23&amp;oldid=prev</id>
		<title>WikiSysop: /* Details */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22118/index.php?title=K-means_clustering&amp;diff=23&amp;oldid=prev"/>
		<updated>2025-09-26T11:22:49Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Details&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 13:22, 26 September 2025&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l32&quot;&gt;Line 32:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 32:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===Details===&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===Details===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Wikipedia has a page on [https://en.wikipedia.org/wiki/K-means_clustering K-means clustering] and on [https://en.wikipedia.org/wiki/K-means%2B%2B K-means++].&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Wikipedia has a page on [https://en.wikipedia.org/wiki/K-means_clustering K-means clustering] and on [https://en.wikipedia.org/wiki/K-means%2B%2B K-means++].&amp;lt;br&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Various data sets: [https://teaching.healthtech.dtu.dk/material/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;22113&lt;/del&gt;/point100.lst 100 data points], [https://teaching.healthtech.dtu.dk/material/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;22113&lt;/del&gt;/point1000.lst 1000 data points],&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Various data sets: [https://teaching.healthtech.dtu.dk/material/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;22118&lt;/ins&gt;/point100.lst 100 data points], [https://teaching.healthtech.dtu.dk/material/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;22118&lt;/ins&gt;/point1000.lst 1000 data points],&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[https://teaching.healthtech.dtu.dk/material/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;22113&lt;/del&gt;/point4169.lst 4169 data points], [https://teaching.healthtech.dtu.dk/material/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;22113&lt;/del&gt;/point5000.lst 5000 data points], [https://teaching.healthtech.dtu.dk/material/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;22113&lt;/del&gt;/point6000.lst 6000 data points], [https://teaching.healthtech.dtu.dk/material/&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;22113&lt;/del&gt;/point10000.lst 10000 data points],&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[https://teaching.healthtech.dtu.dk/material/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;22118&lt;/ins&gt;/point4169.lst 4169 data points], [https://teaching.healthtech.dtu.dk/material/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;22118&lt;/ins&gt;/point5000.lst 5000 data points], [https://teaching.healthtech.dtu.dk/material/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;22118&lt;/ins&gt;/point6000.lst 6000 data points], [https://teaching.healthtech.dtu.dk/material/&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;22118&lt;/ins&gt;/point10000.lst 10000 data points],&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk:443/22118/index.php?title=K-means_clustering&amp;diff=20&amp;oldid=prev</id>
		<title>WikiSysop: Created page with &quot;__NOTOC__ ===Description=== The program reads a number of data points (multi-dimensional vectors) from a file and partitions those into K clusters. Clustering is important in discovering patterns or modes in multi-dimensional data sets. It is also a method of organizing data examples into similar groups (clusters). In this particular case, K-means partitions the data set such that each example (data point) is assigned to exactly one cluster - the one with the closest cen...&quot;</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk:443/22118/index.php?title=K-means_clustering&amp;diff=20&amp;oldid=prev"/>
		<updated>2025-09-26T11:17:51Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;__NOTOC__ ===Description=== The program reads a number of data points (multi-dimensional vectors) from a file and partitions those into K clusters. Clustering is important in discovering patterns or modes in multi-dimensional data sets. It is also a method of organizing data examples into similar groups (clusters). In this particular case, K-means partitions the data set such that each example (data point) is assigned to exactly one cluster - the one with the closest cen...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;__NOTOC__&lt;br /&gt;
===Description===&lt;br /&gt;
The program reads a number of data points (multi-dimensional vectors) from a file and partitions those into K clusters. Clustering is important in discovering patterns or modes in multi-dimensional data sets. It is also a method of organizing data examples into similar groups (clusters). In this particular case, K-means partitions the data set such that each example (data point) is assigned to exactly one cluster - the one with the closest centroid.&amp;lt;br&amp;gt;&lt;br /&gt;
The initial selection of centroids (the &amp;quot;middle&amp;quot; of a cluster) is very important for the performance of the algorithm, both with respect to time used, and results produced. The normal (and rather bad) approach is to randomly select points in the data set to represent the initial centroids. You must improve the algorithm with adding the K-means++ selection method for selecting the initial centroids. This basically selects the first centroid (data point) randomly, but the rest based on far they are from previous selected centroids. The further away - the higher probability of being picked.&lt;br /&gt;
&lt;br /&gt;
===Input and output===&lt;br /&gt;
The input is a tab separated file containing one data point on each line. Each data point is a vector consisting of a number of numbers :-) The program should handle any given vector size, but the vector size is constant in any data file. Input file example:&lt;br /&gt;
&lt;br /&gt;
 ex01	8.76	3.29	1.05&lt;br /&gt;
 ex02	12.3	2.33	3.53&lt;br /&gt;
 ex03	-0.54	-3.56	1.45&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
&lt;br /&gt;
The output is all data points, partitioned in the clusters they belong to. Output example where each cluster starts with the cluster centroid and is proceeded by the the members of that cluster:&lt;br /&gt;
&lt;br /&gt;
 Cluster-1	2.13	3.24	1.54&lt;br /&gt;
 ex10	1.04	2.98	1.34&lt;br /&gt;
 ex12	1.23	2.34	1.69&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
 Cluster-2	-4.13	1.25	6.34&lt;br /&gt;
 ex04	-0.34	3.51	9.02&lt;br /&gt;
 ex07	-8.56	5.12	12.5&lt;br /&gt;
 .&lt;br /&gt;
 .&lt;br /&gt;
&lt;br /&gt;
===Examples of program execution===&lt;br /&gt;
 cluster.py dataset.dat 5&lt;br /&gt;
The 5 is the number of clusters the data point should be partitioned in.&lt;br /&gt;
&lt;br /&gt;
===Details===&lt;br /&gt;
Wikipedia has a page on [https://en.wikipedia.org/wiki/K-means_clustering K-means clustering] and on [https://en.wikipedia.org/wiki/K-means%2B%2B K-means++].&amp;lt;br&amp;gt;&lt;br /&gt;
Various data sets: [https://teaching.healthtech.dtu.dk/material/22113/point100.lst 100 data points], [https://teaching.healthtech.dtu.dk/material/22113/point1000.lst 1000 data points],&lt;br /&gt;
[https://teaching.healthtech.dtu.dk/material/22113/point4169.lst 4169 data points], [https://teaching.healthtech.dtu.dk/material/22113/point5000.lst 5000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point6000.lst 6000 data points], [https://teaching.healthtech.dtu.dk/material/22113/point10000.lst 10000 data points],&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
</feed>