IgraphIntro Answers v1

From 22140
Jump to navigation Jump to search

Aswers: " Cytoscape and network intro exercise"

Answers by: Lars Rønn Olsen and Rasmus Wernersson

Item #1: Loading network as an igraph object

library(igraph)
df <- data.frame(from = c("alpha", "alpha", "beta"), to = c("alpha", "beta", "beta"))
g <- graph_from_data_frame(df, directed = FALSE)


Item #2: Node attributes and visualization

poldelta_graph <- data.frame(from = c("DPOD1_HUMAN", "DPOD1_HUMAN", "DPOD1_HUMAN", "DPOD2_HUMAN", "DPOD2_HUMAN", "DPOD3_HUMAN"), to = c("DPOD2_HUMAN", "DPOD3_HUMAN", "DPOD4_HUMAN", "DPOD3_HUMAN", "DPOD4_HUMAN", "DPOD4_HUMAN"))
poldelta_attributes <- data.frame(ID = c("DPOD1_HUMAN", "DPOD2_HUMAN", "DPOD3_HUMAN", "DPOD4_HUMAN"), geneID = c("PolD1", "PolD2", "PolD3", "PolD4"), Catalytic = c("yes", "no", "no", "no"), AA = c(1009, 469, 466, 107))
g <- graph_from_data_frame(poldelta_graph, directed = FALSE, vertices = poldelta_attributes)
library(ggraph)
ggraph(g) + 
  geom_edge_link() + 
  geom_node_point(aes(size = AA, color = Catalytic)) +
  geom_node_text(aes(label = geneID))
PolD network with selected node attributes


Item #3: Complete node attribute table

The table above was filled in by looking up all protein entries in UniProt (using the links provided in the exercise) and reading about the function of the protein. The cellular localization is easiest to find in the Gene Ontology (GO) -> Cellular component table. We'll learn a lot more about Gene Ontology later on.

Red = updated/changed since the exercise was originally written

Item #4: Question about DNA non-binding proteins

QUESTION: Does it make sense that some of the proteins are not annotated to bind DNA yet are supposed to have a role in DNA replication? (For example PRI1_HUMAN vs. PRI2_HUMAN)

It's very common for protein complexes that interacts with DNA that only some of the sub-units have DNA binding properties. The other proteins in the complex binds to the DNA binding sub-units and will this way be part of the entire DNA binding complex.

Item #5: Node and edge attributes and visualization

poldelta_extented_interactions <- data.frame(from = c("DPOD1_HUMAN", "DPOD1_HUMAN", "DPOD1_HUMAN", "DPOD1_HUMAN", "DPOD1_HUMAN", "DPOD1_HUMAN", "DPOD1_HUMAN", "DPOD1_HUMAN", "DPOD1_HUMAN", "DPOD2_HUMAN", "DPOD2_HUMAN", "DPOD2_HUMAN", "DPOD2_HUMAN", "DPOD2_HUMAN", "DPOD2_HUMAN", "DPOD2_HUMAN", "DPOD2_HUMAN", "DPOD3_HUMAN", "DPOD3_HUMAN", "DPOD3_HUMAN", "DPOD3_HUMAN", "DPOD4_HUMAN", "DPOD4_HUMAN", "DPOD4_HUMAN", "DPOD4_HUMAN", "PRI1_HUMAN", "WRIP1_HUMAN"), to = c("DPOD2_HUMAN", "DPOD3_HUMAN", "DPOD4_HUMAN", "S7A6O_HUMAN", "TREX2_HUMAN", "DNA2L_HUMAN", "PRI1_HUMAN", "PRI2_HUMAN", "WRIP1_HUMAN", "DPOD3_HUMAN", "DPOD4_HUMAN", "PDIP2_HUMAN", "BACD1_HUMAN", "WRIP1_HUMAN", "DNA2L_HUMAN", "PRI1_HUMAN", "PRI2_HUMAN", "DPOD4_HUMAN", "DNA2L_HUMAN", "PRI1_HUMAN", "PRI2_HUMAN", "DNA2L_HUMAN", "PRI1_HUMAN", "PRI2_HUMAN", "WRIP1_HUMAN", "PRI2_HUMAN", "WRIP1_HUMAN"), confidence = c(1.00, 1.00, 1.00, 0.18, 1.00, 1.00, 1.00, 1.00, 0.52, 1.00, 1.00, 1.00, 1.00, 0.54, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.57, 1.00, 0.65))
poldelta_extended_attributes <- data.frame(ID = c("DPOD1_HUMAN", "DPOD2_HUMAN", "DPOD3_HUMAN", "DPOD4_HUMAN", "BACD1_HUMAN", "DNA2L_HUMAN", "PDIP2_HUMAN", "PRI1_HUMAN", "PRI2_HUMAN", "S7A6O_HUMAN", "TREX2_HUMAN", "WRIP1_HUMAN"), geneID = c("PolD1", "PolD2", "PolD3", "PolD4", "KCTD13", "DNA2", "POLDIP2", "PRIM1", "PRIM2", "SLC7A6OS", "TREX2", "WRNIP1"), role_in_rep = c("Polymerase", "Polymerase", "Polymerase", "Polymerase", "Uncertain", "Helicase", "Uncertain", "Primase", "Primase", "Uncertain", "DNA repair", "DNA repair"), DNA_binding = c("+", "+", "-", "-","-", "+", "+", "+", "+", "-", "+", "+"))
g <- graph_from_data_frame(poldelta_extented_interactions, directed = FALSE, vertices = poldelta_extended_attributes)
library(ggraph)
ggraph(g) + 
  geom_edge_link(aes(alpha = confidence)) + 
  geom_node_point(aes(color = role_in_rep, shape = DNA_binding), size = 3) +
  geom_node_text(aes(label = geneID))
Extended PolD network with node and edge attributes


Item #6: Question about the "uncertain function" proteins

FINAL QUESTION - Re-evaluate the three "uncertain" proteins (BACD1_HUMAN, PDIP2_HUMAN, S7A6O_HUMAN)

PDIP2_HUMAN / POLDIP2

Strong support for being involved in DNA replication:

  • Has a high confidence interaction to PolD2
  • Is known to bind DNA
  • According to UniProt it's well established that this protein is associated to the polymerase delta complex:
    • It's even indicated in the protein's recommented name: "Polymerase delta-interacting protein 2".

BACD1_HUMAN / PDIP1 / KCTD13

Strong support for being involved in DNA replication:

  • Has a high confidence interaction to PolD2
  • While the protein itself is not known to bind DNA, several other lines of evidence indicate a connection;
    • It's located in the nucleus (Cellular Compartment: Nucleus).
    • It's annotated to be part of the Biological Process "DNA replication"
    • ... and it's alternative name is "Polymerase delta-interacting protein 1"

S7A6O_HUMAN / SLC7A6OS

Weak support for being involved in DNA replication:

  • Has a low confidence interaction to PolD1
  • While it's known to be located in the nucleus, not much else points in the direction of being involved in DNA replication or associated with the polymerase Delta complex. It may, however, be involved in RNA synthesis, but not much is known presently.