WikiSysop: Created page with "= Answers to the first yeast systems biology exercise = '''Answers by:''' Lars Rønn Olsen and Rasmus Wernersson == Report questions #1 ==

 library(igraph) library(ggraph)  load("/home/projects/22140/exercise4.Rdata")  g <- graph_from_data_frame(interactions, directed = FALSE, vertices = node_attributes)  ggraph(g) +   geom_edge_link(aes(color = score)) +    scale_edge_color_continuous(limits = c(0,1), low = "red", high = "black") +   geom_n..."

2024-03-05T14:27:40Z

Created page with "= Answers to the first yeast systems biology exercise = '''Answers by:''' Lars Rønn Olsen and Rasmus Wernersson == Report questions #1 == <pre style="overflow:auto;"> library(igraph) library(ggraph) load("/home/projects/22140/exercise4.Rdata") g <- graph_from_data_frame(interactions, directed = FALSE, vertices = node_attributes) ggraph(g) + geom_edge_link(aes(color = score)) + scale_edge_color_continuous(limits = c(0,1), low = "red", high = "black") + geom_n..."

New page

= Answers to the first yeast systems biology exercise =
'''Answers by:''' Lars Rønn Olsen and Rasmus Wernersson

== Report questions #1 ==

<pre style="overflow:auto;">
library(igraph)
library(ggraph)

load("/home/projects/22140/exercise4.Rdata")

g <- graph_from_data_frame(interactions, directed = FALSE, vertices = node_attributes)

ggraph(g) +
geom_edge_link(aes(color = score)) +
scale_edge_color_continuous(limits = c(0,1), low = "red", high = "black") +
geom_node_point()
</pre>

== Report question #2 ==

<pre>
node_attributes[node_attributes$cluster %in% "cluster1",]$description
</pre>

(Repeat for each cluster)

The idea is simply to quickly look through the gene/protein descriptions in order to get an overall idea of what types of proteins are present in each cluster.

# Function: DNA-replication
# Function: Origin of replication recognition/Cell division control
# Function: Mixed - trehalose synthesis
# Function: Cyclins, CDC28
# Function: Anaphase-promoting complex
# Function: DNA damage repair
# Function: Cell division control
# Function: Unknown

== Report question #3 ==
Yes - based on what we have learn about cell cycle phases and cell cycle regulation, the following clusters stands out:

# Function: DNA-replication. '''YES:''' (S-phase)
# Function: Origin of replication recognition/Cell division control '''YES''' (S-phase)
# Function: Mixed - trehalose synthesis
# Function: Cyclins, CDC28 '''YES''' (cell cycle regulation)
# Function: Anaphase-promoting complex '''YES''' (M-phase)
# Function: DNA damage repair '''YES''' (S-phase)
# Function: Cell division control '''YES''' (cell cycle regulation)
# Function: Unknown

== Report question #4 ==

Below are solutions for the following:

TASK: make a subgraph of the "big cluster":
* Use the igraph function "decompose" to make a list of connected graphs.
* Calculate the number of nodes in each subgraph in the list using vcount. This can be quickly done using the lapply function.
* Visualize the "big cluster".

Investigate the inter-connectivity: Visually there appears to be a pattern to the way the nodes are connected - this could indicate that this sub-network is not evenly connected.
* Investigate this by visualizing the "big cluster" network with the node size based on node degree.

<pre>
# Make a list of all connected subgraphs

connected_graphs <- decompose(g)

# Extract the graph with the most vertices

big_cluster <- connected_graphs[[which.max(lapply(connected_graphs, vcount))]]

# Visualize, and set the size of the nodes according to node degree

ggraph(big_cluster, layout = "kk") +
geom_edge_link(aes(color = score)) +
scale_edge_color_continuous(limits = c(0,1), low = "red", high = "black") +
geom_node_point(aes(color = cluster, size = degree(big_cluster)))

</pre>

Now, for the following tasks:

TASK: explore the interaction partners
* Randomly select a single protein from the global graph, extract a subgraph with the first order interaction partners using the "neighborhood" function and look at the descriptions of this sub-set.
* Do this for 5-10 randomly chosen proteins - perhaps with small, medium, and high node degree - and note down if any obvious patterns start to emerge.

<pre>
random_subgraph_list <- neighborhood(graph = g, order = 1, nodes = sample(names(V(g)), 1))

random_subgraph <- induced_subgraph(g, unlist(random_subgraph_list))

ggraph(random_subgraph, layout = "kk") +
geom_edge_link() +
geom_node_point()

# Repeat the above 5-10 times
</pre>

Below is the code for the following questions:
* Start, once again, with a single random protein and select its interaction partners in the "big cluster"
* Then extend this selection with the interaction partners of those as well (using the "neighborhood" function with both your selected proteins).
* Repeat this until the entire "big cluster" is selected:
* How many steps do you need?
* Try to find one of the proteins most distantly connected - how many steps do you need here?
* Which network topology measurement is at play here?

<pre>
# Iteratively expanding a network with first order interactants from a random vertex in the big_cluster graph

random_subgraph_list <- neighborhood(graph = big_cluster, order = 1, nodes = sample(names(V(big_cluster)), 1))
random_subgraph <- induced_subgraph(big_cluster, unlist(random_subgraph_list))
ggraph(random_subgraph, layout = "kk") +
geom_edge_link() +
geom_node_point()

vcount(random_subgraph) == vcount(big_cluster)

random_subgraph_list2 <- neighborhood(graph = big_cluster, order = 1, nodes = names(V(random_subgraph)))
random_subgraph2 <- induced_subgraph(big_cluster, unlist(random_subgraph_list2))
ggraph(random_subgraph2, layout = "kk") +
geom_edge_link() +
geom_node_point()

vcount(random_subgraph2) == vcount(big_cluster)

random_subgraph_list3 <- neighborhood(graph = big_cluster, order = 1, nodes = names(V(random_subgraph2)))
random_subgraph3 <- induced_subgraph(big_cluster, unlist(random_subgraph_list3))
ggraph(random_subgraph3, layout = "kk") +
geom_edge_link() +
geom_node_point()

vcount(random_subgraph3) == vcount(big_cluster)

# And so on until the number of vertices in the "big cluster" is the same as the expanded sub graphs. Usually around 4-5 steps is needed, depending of course on your randomly selected first node.

</pre>

* ''Try to find one of the proteins most distantly connected - how many steps do you need here?'' - '''A: 13 steps'''
* ''Which network topology measurement is at play here?'' - '''A: the "longest shortest path" / network diameter'''

== Report question #5 ==
<pre>
HTB1_subgraph_list <- neighborhood(graph = g, order = 1, nodes = "HTB1")
HTB1_subgraph <- induced_subgraph(g, unlist(HTB1_subgraph_list))
ggraph(HTB1_subgraph, layout = "kk") +
geom_edge_link() +
geom_node_point() +
geom_node_text(aes(label = name), repel = TRUE)
</pre>

== Report question #6 ==

<pre>
spd_vertices <- node_attributes[grepl(x = node_attributes$description, ignore.case = TRUE, pattern = "Spindle Pole Body"), ]$name
spb_subgraph <- delete_vertices(g, !names(V(g)) %in% spd_vertices)
ggraph(spb_subgraph, layout = "kk") +
geom_edge_link() +
geom_node_point() +
geom_node_text(aes(label = name), repel = TRUE)

</pre>

== Report question #7 ==

<pre>
cluster cell_cycle_role phase
cluster1 DNA replication S
cluster2 DNA replication S
cluster3 <NA> <NA>
cluster4 Regulation <NA>
cluster5 Regulation M
cluster6 DNA repair S
cluster7 Regulation <NA>
cluster8 <NA> <NA>
</pre>

== Report question #8 ==

<pre>
g <- graph_from_data_frame(interactions, directed = FALSE, vertices = node_attributes)

ggraph(g) +
geom_edge_link() +
geom_node_point(aes(color = cluster, shape = cell_cycle_role))
</pre>

ExYeastSysBio R answers - Revision history