ExYeastSysBio R answers
Answers to the first yeast systems biology exercise
Answers by: Lars Rønn Olsen and Rasmus Wernersson
Report questions #1
library(igraph) library(ggraph) load("/home/projects/22140/exercise4.Rdata") g <- graph_from_data_frame(interactions, directed = FALSE, vertices = node_attributes) ggraph(g) + geom_edge_link(aes(color = score)) + scale_edge_color_continuous(limits = c(0,1), low = "red", high = "black") + geom_node_point()
Report question #2
node_attributes[node_attributes$cluster %in% "cluster1",]$description
(Repeat for each cluster)
The idea is simply to quickly look through the gene/protein descriptions in order to get an overall idea of what types of proteins are present in each cluster.
- Function: DNA-replication
- Function: Origin of replication recognition/Cell division control
- Function: Mixed - trehalose synthesis
- Function: Cyclins, CDC28
- Function: Anaphase-promoting complex
- Function: DNA damage repair
- Function: Cell division control
- Function: Unknown
Report question #3
Yes - based on what we have learn about cell cycle phases and cell cycle regulation, the following clusters stands out:
- Function: DNA-replication. YES: (S-phase)
- Function: Origin of replication recognition/Cell division control YES (S-phase)
- Function: Mixed - trehalose synthesis
- Function: Cyclins, CDC28 YES (cell cycle regulation)
- Function: Anaphase-promoting complex YES (M-phase)
- Function: DNA damage repair YES (S-phase)
- Function: Cell division control YES (cell cycle regulation)
- Function: Unknown
Report question #4
Below are solutions for the following:
TASK: make a subgraph of the "big cluster":
- Use the igraph function "decompose" to make a list of connected graphs.
- Calculate the number of nodes in each subgraph in the list using vcount. This can be quickly done using the lapply function.
- Visualize the "big cluster".
Investigate the inter-connectivity: Visually there appears to be a pattern to the way the nodes are connected - this could indicate that this sub-network is not evenly connected.
- Investigate this by visualizing the "big cluster" network with the node size based on node degree.
# Make a list of all connected subgraphs connected_graphs <- decompose(g) # Extract the graph with the most vertices big_cluster <- connected_graphs[[which.max(lapply(connected_graphs, vcount))]] # Visualize, and set the size of the nodes according to node degree ggraph(big_cluster, layout = "kk") + geom_edge_link(aes(color = score)) + scale_edge_color_continuous(limits = c(0,1), low = "red", high = "black") + geom_node_point(aes(color = cluster, size = degree(big_cluster)))
Now, for the following tasks:
TASK: explore the interaction partners
- Randomly select a single protein from the global graph, extract a subgraph with the first order interaction partners using the "neighborhood" function and look at the descriptions of this sub-set.
- Do this for 5-10 randomly chosen proteins - perhaps with small, medium, and high node degree - and note down if any obvious patterns start to emerge.
random_subgraph_list <- neighborhood(graph = g, order = 1, nodes = sample(names(V(g)), 1)) random_subgraph <- induced_subgraph(g, unlist(random_subgraph_list)) ggraph(random_subgraph, layout = "kk") + geom_edge_link() + geom_node_point() # Repeat the above 5-10 times
Below is the code for the following questions:
- Start, once again, with a single random protein and select its interaction partners in the "big cluster"
- Then extend this selection with the interaction partners of those as well (using the "neighborhood" function with both your selected proteins).
- Repeat this until the entire "big cluster" is selected:
- How many steps do you need?
- Try to find one of the proteins most distantly connected - how many steps do you need here?
- Which network topology measurement is at play here?
# Iteratively expanding a network with first order interactants from a random vertex in the big_cluster graph random_subgraph_list <- neighborhood(graph = big_cluster, order = 1, nodes = sample(names(V(big_cluster)), 1)) random_subgraph <- induced_subgraph(big_cluster, unlist(random_subgraph_list)) ggraph(random_subgraph, layout = "kk") + geom_edge_link() + geom_node_point() vcount(random_subgraph) == vcount(big_cluster) random_subgraph_list2 <- neighborhood(graph = big_cluster, order = 1, nodes = names(V(random_subgraph))) random_subgraph2 <- induced_subgraph(big_cluster, unlist(random_subgraph_list2)) ggraph(random_subgraph2, layout = "kk") + geom_edge_link() + geom_node_point() vcount(random_subgraph2) == vcount(big_cluster) random_subgraph_list3 <- neighborhood(graph = big_cluster, order = 1, nodes = names(V(random_subgraph2))) random_subgraph3 <- induced_subgraph(big_cluster, unlist(random_subgraph_list3)) ggraph(random_subgraph3, layout = "kk") + geom_edge_link() + geom_node_point() vcount(random_subgraph3) == vcount(big_cluster) # And so on until the number of vertices in the "big cluster" is the same as the expanded sub graphs. Usually around 4-5 steps is needed, depending of course on your randomly selected first node.
- Try to find one of the proteins most distantly connected - how many steps do you need here? - A: 13 steps
- Which network topology measurement is at play here? - A: the "longest shortest path" / network diameter
Report question #5
HTB1_subgraph_list <- neighborhood(graph = g, order = 1, nodes = "HTB1") HTB1_subgraph <- induced_subgraph(g, unlist(HTB1_subgraph_list)) ggraph(HTB1_subgraph, layout = "kk") + geom_edge_link() + geom_node_point() + geom_node_text(aes(label = name), repel = TRUE)
Report question #6
spd_vertices <- node_attributes[grepl(x = node_attributes$description, ignore.case = TRUE, pattern = "Spindle Pole Body"), ]$name spb_subgraph <- delete_vertices(g, !names(V(g)) %in% spd_vertices) ggraph(spb_subgraph, layout = "kk") + geom_edge_link() + geom_node_point() + geom_node_text(aes(label = name), repel = TRUE)
Report question #7
cluster cell_cycle_role phase cluster1 DNA replication S cluster2 DNA replication S cluster3 <NA> <NA> cluster4 Regulation <NA> cluster5 Regulation M cluster6 DNA repair S cluster7 Regulation <NA> cluster8 <NA> <NA>
Report question #8
g <- graph_from_data_frame(interactions, directed = FALSE, vertices = node_attributes) ggraph(g) + geom_edge_link() + geom_node_point(aes(color = cluster, shape = cell_cycle_role))