ExYeastCellCycleTranscriptomics2 R answers
Question 1
library(igraph) library(ggraph) load("home/projects/22140/exercise4.Rdata") load("home/projects/22140/exercise6.Rdata") expr$log2fc <- log2(expr$GSM287992/expr$GSM287991) node_attributes_updated <- merge(x = node_attributes, y = expr[!duplicated(expr$SysName),c(1,6)], by.x = "ID", by.y = "SysName", all.x = TRUE) g <- graph_from_data_frame(interactions, directed = FALSE, vertices = node_attributes_updated) ggraph(g) + geom_edge_link() + geom_node_point(aes(color = log2fc, shape = abs(log2fc)>2), size = 3) + scale_color_gradient2(low = "red", mid = "gray", midpoint = 0, high = "blue") expr[grepl(pattern = "kar", x = expr$PopName, ignore.case = TRUE),]
The KAR genes are involved in the karyogamy process (also discussed in the answers for last weeks exercise), and it makes good sense that they are overexpressed in the alpha-factor arrested genes. (Alpha-factor triggers the mating response, which in turn prepares for a fusion of the nuclei of the A- and alpha-cells – this is the process known as karyogamy).
Question 2
The title of the publication is found directly in the GEO database under "Citation(s)".
The number of measurements (50) can be found looking at the number of “samples” being associated with this database entry.
Question 3 + 4
# all in one df <- alpha30_38[alpha30_38$gene %in% c("YPL153C", "YMR199W", "YBL002W", "YGR108W", "YKL185W") & alpha30_38$experiment == "alpha30",] ggplot(df, aes(x = timepoint, y = log2fc, color = gene)) + geom_point() + geom_line() + ggtitle("alpha30") # or one at the time if you prefer ggplot(df, aes(x = timepoint, y = log2fc, group = gene)) + geom_point() + geom_line() + facet_wrap(~gene) + ggtitle("alpha30") df <- alpha30_38[alpha30_38$gene %in% c("YPL153C", "YMR199W", "YBL002W", "YGR108W", "YKL185W") & alpha30_38$experiment == "alpha38",] ggplot(df, aes(x = timepoint, y = log2fc, color = gene)) + geom_point() + geom_line() + ggtitle("alpha38") ggplot(df, aes(x = timepoint, y = log2fc, color = gene)) + geom_point() + geom_line() + facet_wrap(~gene) + ggtitle("alpha38") # Notice that this is the dye swap experiment (in essence the sign will be swapped for the log2 ratios compared to the alpha30 plot).
From looking at the plot I estimate the following interval between the peaks (or between the low points if you prefer). Ignore small bumps on the graphs (measurement uncertainty) and look for the larger trends.
RAD53/YPL153C: 55 min
CLN1/YMR199W: 50 min
HTB2/YBL002W: 65 – 70 min
CLB1/YGR108W: 60 min
ASH1/YKL185W: 65 - 70 min
RAD53/YPL153C: 60 min
CLN1/YMR199W: 65 min
HTB2/YBL002W: 60 min
CLB1/YGR108W: 60 min
ASH1/YKL185W: 65 min
All in all the “true” interdivision time (looking across all 5 genes and both experiments) appears to be ~60 min.
Question 5
Since only a low percentage of all yeast genes are expected to be cell cycle regulated (most are needed for other stuff like basic metabolism) we should expect a random sample of genes to contain few or no cyclic patterns.
df <- alpha30_38[alpha30_38$gene %in% sample(rownames(alpha30), 5) & alpha30_38$experiment == "alpha30",] ggplot(df, aes(x = timepoint, y = log2fc, color = gene)) + geom_point() + geom_line() + ggtitle("alpha30")
Question 6
df <- alpha30_38[alpha30_38$gene %in% "YBL002W",] ggplot(df, aes(x = timepoint, y = log2fc, color = experiment)) + geom_point() + geom_line() + ggtitle("HTB2 in alpha30 and alpha38")
As expected the two graphs are (almost) mirror images of each other. Notice that the actual mRNA/cDNA is the same on both cases, but that the labeling has been reversed, and independent hybridizations against CONTROL has been performed for each timepoint. The slight variation between the graphs is due to the small fluctuations there will always be between independent measurements (“technical variance”).
Question 7
The trick is here to work with the information we were given about where the 5 genes are supposed to peak, and then start translating the time in minutes into phases.
As stated in the exercise manual, we assume that each phase is 25% of the interdivision time.
RAD53 [YPL153C] G1 (mid) DNA repair/cell cycle arrest CLN1 [YMR199W] G1/S1 G1-cyclin - controls entry to the S-phase HTB2 [YBL002W] S (mid) Histone H2B - histones are needed for the new chromosomes CLB1 [YGR108W] G2 (mid) B-type cyclin ASH1 [YKL185W] M (late) Transcriptional regulator (during anaphase)
Here we are going to use both the graphs from question 3+4 and the estimated interdivision time (~60). For example, from the graph below (alpha 30) it appears that HTB2 peak at 65 minutes, which will translate 65 min (and 5 and 125 min) to be in the middle of the S phase. Likewise the timepoint 30 min before and after (35, 95 min) will be directly opposite in the “phase wheel” and can be assigned to be middle of the M phase.
By going over the graphs and working out the phases from the 5 genes, it will in the end be possible to come up with some good estimates of where the phases are found.
Cell cycle phase | Time in minutes (approximate) |
---|---|
G1 | 0, 50-60, 110-120 |
S | 5-15, 65-75 |
G2 | 20-30, 80-90 |
M | 35-45, 95-105 |
Question 8
Coming soon!
Question 9
Peak time (as percentage of cell cycle) from cyclebase.org:
RAD53/YPL153C: 17
CLN1/YMR199W: 25
HTB2/YBL002W: 40
CLB1/YGR108W: 63
ASH1/YKL185W: 97
Question 10
peaktime[abs(peaktime$peaktime - peaktime[peaktime$gene == "HTB2",]$peaktime)<=5,]$gene
Question 11
Coming soon!