ExTopology1 igraph solutions
Exercise: Network topology and statistics
Exercise written by: Lars Rønn Olsen
Notes on today's exercise: today we will get a bit more advanced in our use of igraph. If you get stuck, remember that Google and chatGPT are your friends.
Part I. Getting started
load("/home/projects/22140/exercise3.Rdata")
TASK: Create an igraph object with node attributes
g <- graph_from_data_frame(d = interactions, directed = FALSE, vertices = node_attributes)
Part II. Network layout and Selecting nodes
TASK: Explore network layouts
Try two different layouts "kk" and "fr".
ggraph(g, layout = "fr") + geom_edge_link() + geom_node_point() ggraph(g, layout = "kk") + geom_edge_link() + geom_node_point()
TASK: Explore TP53 in the network
Report question 1: How many proteins interact with TP53?
Color the nodes depending on whether they interact with TP53 or not, and change the shape of the TP53 node.
degree(graph = g, v = "prot_7157") V(g)$TP53_neighbors <- names(V(g)) %in% names(neighbors(graph = g, v = "prot_7157")) V(g)$TP53 <- names(V(g)) == "prot_7157"
Report question 2: Paste a screen shot of your TP53 annotated network into your report.
ggraph(g, layout = "fr") + geom_edge_link() + geom_node_point(aes(color = TP53_neighbors, shape = TP53))
Now create a subnetwork consisting only of TP53 and its interaction partners, and label with gene names.
g_TP53 <- delete_vertices(graph = g, v = !names(V(g)) %in% names(neighbors(graph = g, v = "prot_7157")))
Report question 3: Paste a screen shot of your TP53 subnetwork into your report.
ggraph(g_TP53, layout = "fr") + geom_edge_link() + geom_node_point(aes(color = TP53_neighbors, shape = TP53)) + geom_node_text(aes(label = Gene_Id), repel = TRUE)
Part III. Network statistics
Report question 4: Calculate the following statistics for the full network:
- Average node degree
- Average clustering coefficient (also known as transitivity)
mean(degree(g)) transitivity(g)
Report question 5: Make a plot of the distribution of node degrees and paste this into your report
df <- data.frame(k = degree(g)) ggplot(df, aes(x = k)) + geom_density() + geom_vline(xintercept = length(neighbors(graph = g, v = "prot_7157")), color = "red")
Now calculate the node-wise clustering coefficient. This is done with the transitivity() function, setting the variable type = "local".
Report question 6: Make a plot of the distribution of clustering coefficients and paste this into your report
df <- data.frame(c = transitivity(graph = g, type = "local")) ggplot(df, aes(x = c)) + geom_density() + geom_vline(xintercept = transitivity(graph = g, type = "local", v = "prot_7157"), color = "red")
Look at box 2 in the Barabasi paper. In order to determine whether a network is random, scale free or hierarchical, you need to produce two figures: log(k) vs log(P(k)) and log(k) vs log(C(k)). k is the degree of the nodes in the graph, P(k) is the probability of a given k, and C(k) is the average clustering coefficient of all nodes with a degree of k.
Report question 7: Make log-log plots of k vs P(k) and k vs C(k) with a regression line and paste them into your report.
df <- data.frame(table(degree(graph = g))/sum(degree(graph = g))) df$Var1 <- as.numeric(df$Var1) colnames(df) <- c("k", "Pk") ggplot(data.frame(df), aes(x = log(k), y = log(Pk))) + geom_point() + geom_smooth(method = "lm", se = FALSE) df <- data.frame(k = degree(graph = g), c = transitivity(graph = g, type = "local")) df2 <- NULL for(i in unique(df$k)) { df2 <- rbind(df2, c(i, mean(df[df$k==i,]$c, na.rm = TRUE))) } colnames(df2) <- c("k", "Ck") ggplot(data.frame(df2), aes(x = log(k), y = log(Ck))) + geom_point() + geom_smooth(method = "lm", se = FALSE)
Report question 8: Based on the distribution of the node degree and the clustering coefficients, the network structure appears to be random, scale free or hierarchical? (Remember that a hierarchical network is also scale free).
Appears to be hierarchical
Part IV. Network connectivity
Calculate the network diameter (longest shortest path).
diameter(g)
Report question 9: What is the highest number of edges that you need to connect any two nodes in the network?
4