ExYeastSysBio R answers

From 22140
Jump to navigation Jump to search

Answers to the first yeast systems biology exercise

Answers by: Lars Rønn Olsen and Rasmus Wernersson

Report questions #1

library(igraph)
library(ggraph)

load("/home/projects/22140/exercise4.Rdata")

g <- graph_from_data_frame(interactions, directed = FALSE, vertices = node_attributes)

ggraph(g) +
  geom_edge_link(aes(color = score)) + 
  scale_edge_color_continuous(limits = c(0,1), low = "red", high = "black") +
  geom_node_point()


Report question #2

node_attributes[node_attributes$cluster %in% "cluster1",]$description

(Repeat for each cluster)

The idea is simply to quickly look through the gene/protein descriptions in order to get an overall idea of what types of proteins are present in each cluster.

  1. Function: DNA-replication
  2. Function: Origin of replication recognition/Cell division control
  3. Function: Mixed - trehalose synthesis
  4. Function: Cyclins, CDC28
  5. Function: Anaphase-promoting complex
  6. Function: DNA damage repair
  7. Function: Cell division control
  8. Function: Unknown

Report question #3

Yes - based on what we have learn about cell cycle phases and cell cycle regulation, the following clusters stands out:

  1. Function: DNA-replication. YES: (S-phase)
  2. Function: Origin of replication recognition/Cell division control YES (S-phase)
  3. Function: Mixed - trehalose synthesis
  4. Function: Cyclins, CDC28 YES (cell cycle regulation)
  5. Function: Anaphase-promoting complex YES (M-phase)
  6. Function: DNA damage repair YES (S-phase)
  7. Function: Cell division control YES (cell cycle regulation)
  8. Function: Unknown

Report question #4

Below are solutions for the following:

TASK: make a subgraph of the "big cluster":

  • Use the igraph function "decompose" to make a list of connected graphs.
  • Calculate the number of nodes in each subgraph in the list using vcount. This can be quickly done using the lapply function.
  • Visualize the "big cluster".

Investigate the inter-connectivity: Visually there appears to be a pattern to the way the nodes are connected - this could indicate that this sub-network is not evenly connected.

  • Investigate this by visualizing the "big cluster" network with the node size based on node degree.


# Make a list of all connected subgraphs

connected_graphs <- decompose(g)

# Extract the graph with the most vertices

big_cluster <- connected_graphs[[which.max(lapply(connected_graphs, vcount))]]

# Visualize, and set the size of the nodes according to node degree

ggraph(big_cluster, layout = "kk") +
  geom_edge_link(aes(color = score)) + 
  scale_edge_color_continuous(limits = c(0,1), low = "red", high = "black") +
  geom_node_point(aes(color = cluster, size = degree(big_cluster)))

Now, for the following tasks:

TASK: explore the interaction partners

  • Randomly select a single protein from the global graph, extract a subgraph with the first order interaction partners using the "neighborhood" function and look at the descriptions of this sub-set.
  • Do this for 5-10 randomly chosen proteins - perhaps with small, medium, and high node degree - and note down if any obvious patterns start to emerge.


random_subgraph_list <- neighborhood(graph = g, order = 1, nodes = sample(names(V(g)), 1))

random_subgraph <- induced_subgraph(g, unlist(random_subgraph_list))

ggraph(random_subgraph, layout = "kk") +
  geom_edge_link() + 
  geom_node_point()

# Repeat the above 5-10 times

Below is the code for the following questions:

  • Start, once again, with a single random protein and select its interaction partners in the "big cluster"
  • Then extend this selection with the interaction partners of those as well (using the "neighborhood" function with both your selected proteins).
  • Repeat this until the entire "big cluster" is selected:
  • How many steps do you need?
  • Try to find one of the proteins most distantly connected - how many steps do you need here?
  • Which network topology measurement is at play here?
# Iteratively expanding a network with first order interactants from a random vertex in the big_cluster graph

random_subgraph_list <- neighborhood(graph = big_cluster, order = 1, nodes = sample(names(V(big_cluster)), 1))
random_subgraph <- induced_subgraph(big_cluster, unlist(random_subgraph_list))
ggraph(random_subgraph, layout = "kk") +
  geom_edge_link() + 
  geom_node_point()

vcount(random_subgraph) == vcount(big_cluster)

random_subgraph_list2 <- neighborhood(graph = big_cluster, order = 1, nodes = names(V(random_subgraph)))
random_subgraph2 <- induced_subgraph(big_cluster, unlist(random_subgraph_list2))
ggraph(random_subgraph2, layout = "kk") +
  geom_edge_link() + 
  geom_node_point()

vcount(random_subgraph2) == vcount(big_cluster)

random_subgraph_list3 <- neighborhood(graph = big_cluster, order = 1, nodes = names(V(random_subgraph2)))
random_subgraph3 <- induced_subgraph(big_cluster, unlist(random_subgraph_list3))
ggraph(random_subgraph3, layout = "kk") +
  geom_edge_link() + 
  geom_node_point()

vcount(random_subgraph3) == vcount(big_cluster)

# And so on until the number of vertices in the "big cluster" is the same as the expanded sub graphs. Usually around 4-5 steps is needed, depending of course on your randomly selected first node.

  • Try to find one of the proteins most distantly connected - how many steps do you need here? - A: 13 steps
  • Which network topology measurement is at play here? - A: the "longest shortest path" / network diameter

Report question #5

HTB1_subgraph_list <- neighborhood(graph = g, order = 1, nodes = "HTB1")
HTB1_subgraph <- induced_subgraph(g, unlist(HTB1_subgraph_list))
ggraph(HTB1_subgraph, layout = "kk") +
  geom_edge_link() + 
  geom_node_point() +
  geom_node_text(aes(label = name), repel = TRUE)

Report question #6

spd_vertices <- node_attributes[grepl(x = node_attributes$description, ignore.case = TRUE, pattern = "Spindle Pole Body"), ]$name
spb_subgraph <- delete_vertices(g, !names(V(g)) %in% spd_vertices)
ggraph(spb_subgraph, layout = "kk") +
  geom_edge_link() + 
  geom_node_point() +
  geom_node_text(aes(label = name), repel = TRUE)

Report question #7

cluster  cell_cycle_role phase
cluster1 DNA replication     S
cluster2 DNA replication     S
cluster3            <NA>  <NA>
cluster4      Regulation  <NA>
cluster5      Regulation     M
cluster6      DNA repair     S
cluster7      Regulation  <NA>
cluster8            <NA>  <NA>

Report question #8

g <- graph_from_data_frame(interactions, directed = FALSE, vertices = node_attributes)

ggraph(g) +
  geom_edge_link() + 
  geom_node_point(aes(color = cluster, shape = cell_cycle_role))