ExTopology1 igraph solutions

From 22140
Jump to navigation Jump to search

Exercise: Network topology and statistics

Exercise written by: Lars Rønn Olsen

Notes on today's exercise: today we will get a bit more advanced in our use of igraph. If you get stuck, remember that Google and chatGPT are your friends.

Part I. Getting started

load("/home/projects/22140/exercise3.Rdata")

TASK: Create an igraph object with node attributes

g <- graph_from_data_frame(d = interactions, directed = FALSE, vertices = node_attributes)

Part II. Network layout and Selecting nodes

TASK: Explore network layouts

Try two different layouts "kk" and "fr".

ggraph(g, layout = "fr") +
  geom_edge_link() + 
  geom_node_point()

ggraph(g, layout = "kk") +
  geom_edge_link() + 
  geom_node_point()

TASK: Explore TP53 in the network

Report question 1: How many proteins interact with TP53?

Color the nodes depending on whether they interact with TP53 or not, and change the shape of the TP53 node.

degree(graph = g, v = "prot_7157")
V(g)$TP53_neighbors <- names(V(g)) %in% names(neighbors(graph = g, v = "prot_7157"))
V(g)$TP53 <- names(V(g)) == "prot_7157"

Report question 2: Paste a screen shot of your TP53 annotated network into your report.

ggraph(g, layout = "fr") +
  geom_edge_link() + 
  geom_node_point(aes(color = TP53_neighbors, shape = TP53))

Now create a subnetwork consisting only of TP53 and its interaction partners, and label with gene names.

g_TP53 <- delete_vertices(graph = g, v = !names(V(g)) %in% names(neighbors(graph = g, v = "prot_7157")))

Report question 3: Paste a screen shot of your TP53 subnetwork into your report.

ggraph(g_TP53, layout = "fr") +
  geom_edge_link() + 
  geom_node_point(aes(color = TP53_neighbors, shape = TP53)) +
  geom_node_text(aes(label = Gene_Id), repel = TRUE)

Part III. Network statistics

Report question 4: Calculate the following statistics for the full network:

  • Average node degree
  • Average clustering coefficient (also known as transitivity)
mean(degree(g))
transitivity(g)

Report question 5: Make a plot of the distribution of node degrees and paste this into your report

df <- data.frame(k = degree(g))
ggplot(df, aes(x = k)) +
  geom_density() + 
  geom_vline(xintercept = length(neighbors(graph = g, v = "prot_7157")), color = "red")

Now calculate the node-wise clustering coefficient. This is done with the transitivity() function, setting the variable type = "local".

Report question 6: Make a plot of the distribution of clustering coefficients and paste this into your report

df <- data.frame(c = transitivity(graph = g, type = "local"))
ggplot(df, aes(x = c)) +
  geom_density() +
  geom_vline(xintercept = transitivity(graph = g, type = "local", v = "prot_7157"), color = "red")

Look at box 2 in the Barabasi paper. In order to determine whether a network is random, scale free or hierarchical, you need to produce two figures: log(k) vs log(P(k)) and log(k) vs log(C(k)). k is the degree of the nodes in the graph, P(k) is the probability of a given k, and C(k) is the average clustering coefficient of all nodes with a degree of k.

Report question 7: Make log-log plots of k vs P(k) and k vs C(k) with a regression line and paste them into your report.

df <- data.frame(table(degree(graph = g))/sum(degree(graph = g)))
df$Var1 <- as.numeric(df$Var1)
colnames(df) <- c("k", "Pk")
ggplot(data.frame(df), aes(x = log(k), y = log(Pk))) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

df <- data.frame(k = degree(graph = g), c = transitivity(graph = g, type = "local"))
df2 <- NULL
for(i in unique(df$k)) {
  df2 <- rbind(df2, c(i, mean(df[df$k==i,]$c, na.rm = TRUE)))
}
colnames(df2) <- c("k", "Ck")
ggplot(data.frame(df2), aes(x = log(k), y = log(Ck))) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

Report question 8: Based on the distribution of the node degree and the clustering coefficients, the network structure appears to be random, scale free or hierarchical? (Remember that a hierarchical network is also scale free).

Appears to be hierarchical

Part IV. Network connectivity

Calculate the network diameter (longest shortest path).

diameter(g)

Report question 9: What is the highest number of edges that you need to connect any two nodes in the network?

4