ExTopology1 igraph: Difference between revisions
| Line 59: | Line 59: | ||
| [[Image:Office-notes-line_drawing.png|30px|left]] | [[Image:Office-notes-line_drawing.png|30px|left]] | ||
| '''Report question 4''': Calculate the following statistics for the full network: | '''Report question 4''': Calculate the following statistics for the full network: | ||
| * Average node degree | * Average node degree | ||
Revision as of 13:47, 21 August 2025
Exercise: Network topology, statistics, and clustering
Exercise written by: Lars Rønn Olsen & Kristoffer Vitting-Seerup
Notes on today's exercise: today we will get a bit more advanced in our use of igraph. If you get stuck, remember that Google and chatGPT are your friends.
Part I. Getting started
In this exercise we will use a subset of the human interaction dataset by Rual et al. (Nature.2005 Oct 20;437(7062):1173-8).
The data consists of an interaction data frame (without edge annotations) and a node annotation data frame with gene names. The data is located on the health tech server, and you can load it in R studio using the following command:
load("/home/projects/22140/exercise3.Rdata")
TASK: Create an igraph object with node attributes
This network consists of 1089 interactions observed between 419 human proteins, and is a small subset of a larger human interaction dataset. This subset consists of proteins that interact with the transcription factor TP53 (also known as P53).
Take a moment to read about the function of TP53 by looking it up in UniProt:
Part II. Network layout and Selecting nodes
TASK: Explore network layouts
Try two different layouts "kk" and "fr".

Report question 0: Write a short discussion on which layout you think is a more useful visualization (there's no straightforward answer - layouts are visualizations: a way to convey messages with data)
TASK: Explore TP53 in the network
Use the igraph function neighbors() to get a list of first order interaction partners of TP53.

Report question 1: How many proteins interact with TP53?
Color the nodes depending on whether they interact with TP53 or not, and change the shape of the TP53 node.
Hint: one way to do this in R is to add two new node attributes (one for TP53 and one for TP53 interaction). You can either add these node attribute vectors to the node attribute table and reload the graph, or you can simply add it using the V() function.
Hint 2: for the interaction attribute, start by making a vector repeating "no" n times, where n is the number of nodes. Then, change a subset of that vector to "yes" based on whether a given node interacts with TP53.

Report question 2: Paste a screen shot of your TP53 annotated network into your report.
Now create a subnetwork consisting only of TP53 and its interaction partners, and label with gene names.
Hint: one way to do this is to use the delete_vertices() function to make a new graph, keeping only the nodes interacting with TP53.

Report question 3: Paste a screen shot of your TP53 subnetwork into your report.
Part III. Network statistics

Report question 4: Calculate the following statistics for the full network:
- Average node degree
- Average clustering coefficient (also known as transitivity)
- Network diameter

Report question 5: Make a plot of the distribution of node degrees and paste this into your report Where is P53 in the Node Degree Distribution plot? (mark it on the distribution plot). Why do you think P53 has this node degree? Does it make it an important or less important protein in the network?
Now calculate the node-wise clustering coefficient. This is done with the transitivity() function, setting the variable type = "local".

Report question 6: Make a plot of the distribution of clustering coefficients and paste this into your report Where is TP53 in the clustering coefficient distribution plot? (mark it on the distribution plot). Why do you think P53 has this clustering coefficient? Discuss if this clustering coefficient make p53 an important or less important protein in the network.
Look at box 2 in the Barabasi paper. In order to determine whether a network is random, scale free or hierarchical, you need to produce two figures: log(k) vs log(P(k)) and log(k) vs log(C(k)). k is the degree of the nodes in the graph, P(k) is the probability of a given k, and C(k) is the average clustering coefficient of all nodes with a degree of k.
Here is a bit of code to calculate k vs P(k):
df <- data.frame(table(degree(graph = g))/sum(degree(graph = g)))
df$Var1 <- as.numeric(df$Var1)
colnames(df) <- c("k", "Pk")
Here is a bit of code to calculate k vs C(k):
df <- data.frame(k = degree(graph = g), c = transitivity(graph = g, type = "local"))
df2 <- NULL
for(i in unique(df$k)) {
  df2 <- rbind(df2, c(i, mean(df[df$k==i,]$c, na.rm = TRUE)))
}
colnames(df2) <- c("k", "Ck")
df2 <- data.frame(df2)
Make sure you understand what is going on in the code above! (and feel free to make it nicer - this is admittedly a bit clunky, but it works).
 

Report question 7: Make log-log plots of k vs P(k) and k vs C(k) with a regression line and paste them into your report.

Report question 8: Based on the distribution of the node degree and the clustering coefficients, the network structure appears to be random, scale free or hierarchical? (Remember that a hierarchical network is also scale free).
Part IV. Network connectivity
Calculate the network diameter (longest shortest path).

Report question 9: What is the highest number of edges that you need to connect any two nodes in the network?
This phenomenon is known as ‘small-world-network’ and can be found in many real life networks, e.g. the network that connects actors who have appeared in the same movie. You can connect any two actors on http://oracleofbacon.org/. Try, just for fun, with a few actors and see how many edges (movies) that are required to connect them.

Report question 10 (optional and just for fun): What actor gave you the highest bacon number and who was it?