Answers to the first yeast systems biology exercise

Answers by: Lars Rønn Olsen and Rasmus Wernersson

Report questions #1

library(igraph)
library(ggraph)

load("/home/projects/22140/exercise4.Rdata")

g <- graph_from_data_frame(interactions, directed = FALSE, vertices = node_attributes)

ggraph(g) +
  geom_edge_link(aes(color = score)) + 
  scale_edge_color_continuous(limits = c(0,1), low = "red", high = "black") +
  geom_node_point()

Report question #2

node_attributes[node_attributes$cluster %in% "cluster1",]$description

(Repeat for each cluster)

The idea is simply to quickly look through the gene/protein descriptions in order to get an overall idea of what types of proteins are present in each cluster.

Function: DNA-replication
Function: Origin of replication recognition/Cell division control
Function: Mixed - trehalose synthesis
Function: Cyclins, CDC28
Function: Anaphase-promoting complex
Function: DNA damage repair
Function: Cell division control
Function: Unknown

Report question #3

Yes - based on what we have learn about cell cycle phases and cell cycle regulation, the following clusters stands out:

Function: DNA-replication. YES: (S-phase)
Function: Origin of replication recognition/Cell division control YES (S-phase)
Function: Mixed - trehalose synthesis
Function: Cyclins, CDC28 YES (cell cycle regulation)
Function: Anaphase-promoting complex YES (M-phase)
Function: DNA damage repair YES (S-phase)
Function: Cell division control YES (cell cycle regulation)
Function: Unknown

Report question #4

Below are solutions for the following:

TASK: make a subgraph of the "big cluster":

Use the igraph function "decompose" to make a list of connected graphs.
Calculate the number of nodes in each subgraph in the list using vcount. This can be quickly done using the lapply function.
Visualize the "big cluster".

Investigate the inter-connectivity: Visually there appears to be a pattern to the way the nodes are connected - this could indicate that this sub-network is not evenly connected.

Investigate this by visualizing the "big cluster" network with the node size based on node degree.

# Make a list of all connected subgraphs

connected_graphs <- decompose(g)

# Extract the graph with the most vertices

big_cluster <- connected_graphs[[which.max(lapply(connected_graphs, vcount))]]

# Visualize, and set the size of the nodes according to node degree

ggraph(big_cluster, layout = "kk") +
  geom_edge_link(aes(color = score)) + 
  scale_edge_color_continuous(limits = c(0,1), low = "red", high = "black") +
  geom_node_point(aes(color = cluster, size = degree(big_cluster)))

Now, for the following tasks:

TASK: explore the interaction partners

Randomly select a single protein from the global graph, extract a subgraph with the first order interaction partners using the "neighborhood" function and look at the descriptions of this sub-set.
Do this for 5-10 randomly chosen proteins - perhaps with small, medium, and high node degree - and note down if any obvious patterns start to emerge.

random_subgraph_list <- neighborhood(graph = g, order = 1, nodes = sample(names(V(g)), 1))

random_subgraph <- induced_subgraph(g, unlist(random_subgraph_list))

ggraph(random_subgraph, layout = "kk") +
  geom_edge_link() + 
  geom_node_point()

# Repeat the above 5-10 times

Below is the code for the following questions:

Start, once again, with a single random protein and select its interaction partners in the "big cluster"
Then extend this selection with the interaction partners of those as well (using the "neighborhood" function with both your selected proteins).
Repeat this until the entire "big cluster" is selected:
How many steps do you need?
Try to find one of the proteins most distantly connected - how many steps do you need here?
Which network topology measurement is at play here?

# Iteratively expanding a network with first order interactants from a random vertex in the big_cluster graph

random_subgraph_list <- neighborhood(graph = big_cluster, order = 1, nodes = sample(names(V(big_cluster)), 1))
random_subgraph <- induced_subgraph(big_cluster, unlist(random_subgraph_list))
ggraph(random_subgraph, layout = "kk") +
  geom_edge_link() + 
  geom_node_point()

vcount(random_subgraph) == vcount(big_cluster)

random_subgraph_list2 <- neighborhood(graph = big_cluster, order = 1, nodes = names(V(random_subgraph)))
random_subgraph2 <- induced_subgraph(big_cluster, unlist(random_subgraph_list2))
ggraph(random_subgraph2, layout = "kk") +
  geom_edge_link() + 
  geom_node_point()

vcount(random_subgraph2) == vcount(big_cluster)

random_subgraph_list3 <- neighborhood(graph = big_cluster, order = 1, nodes = names(V(random_subgraph2)))
random_subgraph3 <- induced_subgraph(big_cluster, unlist(random_subgraph_list3))
ggraph(random_subgraph3, layout = "kk") +
  geom_edge_link() + 
  geom_node_point()

vcount(random_subgraph3) == vcount(big_cluster)

# And so on until the number of vertices in the "big cluster" is the same as the expanded sub graphs. Usually around 4-5 steps is needed, depending of course on your randomly selected first node.

Try to find one of the proteins most distantly connected - how many steps do you need here? - A: 13 steps
Which network topology measurement is at play here? - A: the "longest shortest path" / network diameter

Report question #5

HTB1_subgraph_list <- neighborhood(graph = g, order = 1, nodes = "HTB1")
HTB1_subgraph <- induced_subgraph(g, unlist(HTB1_subgraph_list))
ggraph(HTB1_subgraph, layout = "kk") +
  geom_edge_link() + 
  geom_node_point() +
  geom_node_text(aes(label = name), repel = TRUE)

Report question #6

spd_vertices <- node_attributes[grepl(x = node_attributes$description, ignore.case = TRUE, pattern = "Spindle Pole Body"), ]$name
spb_subgraph <- delete_vertices(g, !names(V(g)) %in% spd_vertices)
ggraph(spb_subgraph, layout = "kk") +
  geom_edge_link() + 
  geom_node_point() +
  geom_node_text(aes(label = name), repel = TRUE)

Report question #7

cluster  cell_cycle_role phase
cluster1 DNA replication     S
cluster2 DNA replication     S
cluster3            <NA>  <NA>
cluster4      Regulation  <NA>
cluster5      Regulation     M
cluster6      DNA repair     S
cluster7      Regulation  <NA>
cluster8            <NA>  <NA>

Report question #8

g <- graph_from_data_frame(interactions, directed = FALSE, vertices = node_attributes)

ggraph(g) +
  geom_edge_link() + 
  geom_node_point(aes(color = cluster, shape = cell_cycle_role))

ExYeastSysBio R answers

Contents

Answers to the first yeast systems biology exercise

Report questions #1

Report question #2

Report question #3

Report question #4

Report question #5

Report question #6

Report question #7

Report question #8

Navigation menu

ExYeastSysBio R answers

Answers to the first yeast systems biology exercise

Report questions #1

Report question #2

Report question #3

Report question #4

Report question #5

Report question #6

Report question #7

Report question #8

Navigation menu

Search