For this exercises, you have 6 samples for which you have measured the expression of 4 genes. Download excel sheet with data here. You can assume that the expression of these genes are normally distributed, so you may used the Euclidean distance to cluster the samples using hierarchical agglomerative clustering. Hint: if you are using excel, here's the formula to calculate the Euclidean distance between two vectors.
Task 1: Your first task is to construct a distance matrix (all samples vs all samples) using Euclidean distance.
Task 2: Then, using hierarchical clustering, cluster the samples into two clusters.
Task 3: Visualize you samples, e.g. in excel, using PCA here are principal components 1 and 2.
Task 4: Examine the plot. Does the clustering make sense?
You are now given a new sample, for which the following gene expression is measured:
gene1: 1.2
gene2: 1.9
gene3: 2.3
Task 5: Using a knn classifier with k=3, classify the sample as belonging to either class 1 or class 2.
Task 6: Using a knn classifier with k=1, classify the sample as belonging to either class 1 or class 2.
Task 7: Calculate the centroids of class 1 or class 2.
Task 8: Using distance to centroid, classify your sample as belonging to either class 1 or class 2