<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://teaching.healthtech.dtu.dk/22126/index.php?action=history&amp;feed=atom&amp;title=Rnaseq_exercise_answers</id>
	<title>Rnaseq exercise answers - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://teaching.healthtech.dtu.dk/22126/index.php?action=history&amp;feed=atom&amp;title=Rnaseq_exercise_answers"/>
	<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22126/index.php?title=Rnaseq_exercise_answers&amp;action=history"/>
	<updated>2026-05-13T01:09:53Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22126/index.php?title=Rnaseq_exercise_answers&amp;diff=35&amp;oldid=prev</id>
		<title>WikiSysop: Created page with &quot; &lt;div class=&quot;page-content has-page-title&quot;&gt; &lt;div id=&quot;overview-and-background&quot; class=&quot;section level1&quot;&gt; &lt;h1&gt;Overview and background&lt;/h1&gt; &lt;div id=&quot;groups&quot; class=&quot;section level2&quot;&gt; &lt;h2&gt;Groups&lt;/h2&gt; &lt;p&gt;Please get into groups of 2-3. We don’t have enough computational power for all of you working alone. Please let the instructors know if you need help finding a group.&lt;/p&gt; &lt;/div&gt;  &lt;div id=&quot;assignment-notes&quot; class=&quot;section level2&quot;&gt; &lt;h2&gt;Assignment notes&lt;/h2&gt; &lt;p&gt;While some question...&quot;</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22126/index.php?title=Rnaseq_exercise_answers&amp;diff=35&amp;oldid=prev"/>
		<updated>2024-03-19T15:37:57Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot; &amp;lt;div class=&amp;quot;page-content has-page-title&amp;quot;&amp;gt; &amp;lt;div id=&amp;quot;overview-and-background&amp;quot; class=&amp;quot;section level1&amp;quot;&amp;gt; &amp;lt;h1&amp;gt;Overview and background&amp;lt;/h1&amp;gt; &amp;lt;div id=&amp;quot;groups&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt; &amp;lt;h2&amp;gt;Groups&amp;lt;/h2&amp;gt; &amp;lt;p&amp;gt;Please get into groups of 2-3. We don’t have enough computational power for all of you working alone. Please let the instructors know if you need help finding a group.&amp;lt;/p&amp;gt; &amp;lt;/div&amp;gt;  &amp;lt;div id=&amp;quot;assignment-notes&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt; &amp;lt;h2&amp;gt;Assignment notes&amp;lt;/h2&amp;gt; &amp;lt;p&amp;gt;While some question...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&lt;br /&gt;
&amp;lt;div class=&amp;quot;page-content has-page-title&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;overview-and-background&amp;quot; class=&amp;quot;section level1&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h1&amp;gt;Overview and background&amp;lt;/h1&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;groups&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;Groups&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Please get into groups of 2-3. We don’t have enough computational power for all of you working alone. Please let the instructors know if you need help finding a group.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;assignment-notes&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;Assignment notes&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;While some questions might seem hard we naturally don’t ask questions/tasks which you have not been given the tools to solve in this assignment - so if you are stuck try thinking about what you have already learned before asking an instructor.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;assignment-overview&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;Assignment overview&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;In this assignment you are going to analyze RNA-sequencing data from real cancer patients to analyze the importance of alternative splicing in a clinical context&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div id=&amp;quot;biological-background&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;Biological background&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Today you will be working with colorectal cancers - specifically Colon Adenocarcinoma (often abbreviated COAD). It is a cancer of the colon that is very frequent. The lifetime risk of developing&lt;br /&gt;
colorectal cancer is ~4% for both males and females. That means COAD represents ~10% of all cancers and results in the death of hundreds of thousands of people each year! (More info on COAD can be found on [https://en.wikipedia.org/wiki/Colorectal_cancer Wikipedia].&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;One important aspect of cancer is that tumors from different patients are extremely different even when they originate from the same tissue (more info on tumor heterogeneity [https://en.wikipedia.org/wiki/Tumour_heterogeneity here]). To improve treatment and prognosis we therefore try to classify COAD into cancer subtypes (a simple form of precision medicine). We currently think there are 5 subtypes (see [https://www.cell.com/cancer-cell/pdf/S1535-6108(18)30114-4.pdf Liu &amp;#039;&amp;#039;et al.&amp;#039;&amp;#039;]) and today you will be working with CIN and GS. CIN is an abbreviation for Chromosomal INstable and GS means genome stable. More on that later.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;To help us understand COAD subtypes you will today compare these to healthy adjacent tissue. For all samples a biopsy was taken and bulk RNA-seq performed. Low-quality samples have been removed.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;bioinformatic-background&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;Bioinformatic background&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;For background on transcriptomics and splicing please refer to today’s slides. The data you are working with is a randomly selected a subset of the TCGA COAD data (google TCGA if you want to know more). The data was quantified with Kallisto against the human transcriptome.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;Today you will be using the &amp;#039;pairedGSEA&amp;#039; R package we developed. This package is specifically designed to make it easy to do the following analysis:&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ol style=&amp;quot;list-style-type: decimal&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Differential gene expression (aka DGE) via DESeq(2)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Differential gene usage (differential splicing) (aka DGU)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;gene-set over-representation analysis (ORA) on DGU and DGE&lt;br /&gt;
results&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;While at each step facilitating easy comparison of DGE and DGU.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;hr /&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;assignment&amp;quot; class=&amp;quot;section level1&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h1&amp;gt;Assignment&amp;lt;/h1&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;step-1-determine-which-cancer-to-work-with&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;Step 1: Determine which cancer to work with&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Determine which cancer type you will work with:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;If your birthday is within the first 6 months of the year (January-June) you will work with &amp;lt;strong&amp;gt;CIN&amp;lt;/strong&amp;gt;.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;If your birthday is within the last 6 months of the year (July-December) you will work with &amp;lt;strong&amp;gt;GS&amp;lt;/strong&amp;gt;.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;step-2-set-up-enviroment&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;Step 2: Set up enviroment&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Log into the server as you usually do except this time you have to use the &amp;#039;-X&amp;#039; option. That means using:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ssh -X username@pupil1.healthtech.dtu.dk&amp;lt;/pre&amp;gt;.&lt;br /&gt;
&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;Make a directory for this exercise and move into it&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir transcriptomics_exercise&lt;br /&gt;
cd transcriptomics_exercise&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;Copy the exercise data of your cancer subtype to your folder&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### for CIN subtype:&lt;br /&gt;
cp /home/projects/22126_NGS/exercises/transcriptomics/coad_iso_subset_cin.Rdata .&lt;br /&gt;
&lt;br /&gt;
### For GS subtype:&lt;br /&gt;
cp /home/projects/22126_NGS/exercises/transcriptomics/coad_iso_subset_gs.Rdata .&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;step-3-start-r-session-and-enviroment&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;Step 3: Start R session and enviroment&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;To start an R session in your terminal typing (or copy/pasting)&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
R-4.2.2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;And load the library we need by typing&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
library(pairedGSEA)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;This loads the functionality of the “pairedGSEA” R package.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;step-4-load-and-inspect-data&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;Step 4: Load and inspect data&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Load the assignment data into your R session:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### for CIN subtype:&lt;br /&gt;
load(&amp;amp;#39;coad_iso_subset_cin.Rdata&amp;amp;#39;)&lt;br /&gt;
&lt;br /&gt;
### For GS subtype:&lt;br /&gt;
load(&amp;amp;#39;coad_iso_subset_gs.Rdata&amp;amp;#39;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;This will give you two data objects in your R session:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;ol style=&amp;quot;list-style-type: decimal&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;A count matrix&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;A matrix with meta information about each sample in the count matrix.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;A list of gene_sets that you should use for your ORA analysis (step 7).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;All objects can be directly used by the &amp;#039;pairedGSEA&amp;#039;&lt;br /&gt;
package - no need to do any data modifications.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;br&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Use the following functions to take a look at the data:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### List objects in an R session&lt;br /&gt;
ls()&lt;br /&gt;
&lt;br /&gt;
### Inspect the first lines of the object&lt;br /&gt;
head( &amp;amp;lt;object_name&amp;amp;gt; )&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Question&amp;lt;/strong&amp;gt;: Which object contains what data?&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Answer&amp;lt;/strong&amp;gt;:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;ol style=&amp;quot;list-style-type: decimal&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;cinCountsSubset : Count data&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;cinMeta : Condition info (ctrl vs cancer)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;gene_set_list : List of gene-sets&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;step-5-run-differential-analysis&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;Step 5: Run differential analysis&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Next you will need to use the &amp;#039;pairedGSEA&amp;#039; package and&lt;br /&gt;
here a bit of self-study is needed. &amp;lt;strong&amp;gt;Importantly&amp;lt;/strong&amp;gt; you&lt;br /&gt;
should only run this analysis once per group - else we don’t have&lt;br /&gt;
enough computational power. You can download the&lt;br /&gt;
&amp;#039;pairedGSEA&amp;#039; vignette (short document showing how to use it)&lt;br /&gt;
&amp;lt;a href=&amp;quot;https://www.dropbox.com/s/oalth29pxulffec/pairedGSEA.html?dl=1&amp;quot;&amp;gt;here&amp;lt;/a&amp;gt;.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Hints:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;ol style=&amp;quot;list-style-type: decimal&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;After reading the introduction you can skip to the&lt;br /&gt;
&amp;#039;3.3 Running the analysis&amp;#039; section.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;For now you only need to use &amp;#039;paired_diff()&amp;#039; as that&lt;br /&gt;
makes both differential analyses (both DGE and DGU).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;There is no need to use the “store_results” option&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Question&amp;lt;/strong&amp;gt;: This will take a while to run (~10 min).&lt;br /&gt;
In the mean time take a closer look at the Liu &amp;lt;em&amp;gt;et al.&amp;lt;/em&amp;gt; paper&lt;br /&gt;
(see above) and summarise what the difference between the CIN and GS&lt;br /&gt;
COAD subtypes are.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Answer&amp;lt;/strong&amp;gt;:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    gi_diff_results &amp;amp;lt;- paired_diff(&lt;br /&gt;
    object = cinCountsSubset,&lt;br /&gt;
    metadata = cinMeta, # Use with count matrix or if you want to change it in&lt;br /&gt;
    # the input object&lt;br /&gt;
    group_col = &amp;amp;#39;condition&amp;amp;#39;,&lt;br /&gt;
    sample_col = &amp;amp;#39;sample_id&amp;amp;#39;,&lt;br /&gt;
    baseline = &amp;amp;#39;Control&amp;amp;#39;,&lt;br /&gt;
    case = &amp;amp;#39;COAD_genome_instable&amp;amp;#39;,&lt;br /&gt;
    store_results = FALSE&lt;br /&gt;
)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;step-6-inspect-diffrential-result&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;Step 6: Inspect diffrential result&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Question&amp;lt;/strong&amp;gt;: Look at the first 10 lines of the result&lt;br /&gt;
file. Which gene is most significant (smallest p-value) for the DGE and&lt;br /&gt;
DGU analysis (respectively DESeq2 and DEXSeq)&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Answer&amp;lt;/strong&amp;gt;:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;DESeq2 (DGE): AAR2&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;DEXSeq (DGU): A1BG&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;br&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;The following code &amp;lt;em&amp;gt;example&amp;lt;/em&amp;gt; counts how many significantly&lt;br /&gt;
differentially expressed genes are found:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sum( gi_diff_results$padj_deseq &amp;amp;lt; 0.05, na.rm = T )&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Question&amp;lt;/strong&amp;gt;: Modify the R code above to count how many&lt;br /&gt;
genes are DGE and DGU.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Answer&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sum( gi_diff_results$padj_deseq &amp;amp;lt; 0.05, na.rm = T )&lt;br /&gt;
# 4860&lt;br /&gt;
sum( gi_diff_results$padj_dexseq &amp;amp;lt; 0.05, na.rm = T )&lt;br /&gt;
# 2117&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Question&amp;lt;/strong&amp;gt;: Use the &amp;#039;nrow()&amp;#039; function to&lt;br /&gt;
calculate the fraction of genes that are DGE and DGU.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Answer&amp;lt;/strong&amp;gt;:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sum( gi_diff_results$padj_deseq &amp;amp;lt; 0.05, na.rm = T ) / nrow(gi_diff_results)&lt;br /&gt;
# 0.66&lt;br /&gt;
sum( gi_diff_results$padj_dexseq &amp;amp;lt; 0.05, na.rm = T )  / nrow(gi_diff_results)&lt;br /&gt;
# 0.29&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;Now we are ready to do the gene-set enrichment analysis.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;step-7-run-gene-set-enrichment-analysis&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;Step 7: Run Gene-Set Enrichment Analysis&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Use the vignette to help you use &amp;#039;pairedGSEA&amp;#039; to run GSEA on both DGE and DGU results (see the vignette section 4: “Over-Representation Analysis”). You should use the &amp;#039;gene_set_list&amp;#039; object you have already loaded into R instead of using the &amp;#039;prepare_msigdb()&amp;#039; function.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;Note: There is (again) no need to store the intermediary results.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Answer&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
    gi_paired_ora &amp;lt;- paired_ora(&lt;br /&gt;
    paired_diff_result = gi_diff_results,&lt;br /&gt;
    gene_sets = gene_set_list,&lt;br /&gt;
    experiment_title = NULL&lt;br /&gt;
)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;step-8-inspect-ora-result&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;Step 8: Inspect ORA result&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;What you have been analyzing so far is a subset of the entire dataset&lt;br /&gt;
(since the runtime else would have been 3-4x longer). To enable a more&lt;br /&gt;
realistic last step use &amp;lt;strong&amp;gt;one&amp;lt;/strong&amp;gt; of these commands to load&lt;br /&gt;
the full results corresponding to what you have been working with.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### for CIN subtype:&lt;br /&gt;
load(&amp;amp;#39;/home/projects/22126_NGS/exercises/transcriptomics/03_coad_cin_ora.Rdata&amp;amp;#39;)&lt;br /&gt;
# loads the &amp;amp;quot;cin_ora&amp;amp;quot; object&lt;br /&gt;
&lt;br /&gt;
### For GS subtype:&lt;br /&gt;
load(&amp;amp;#39;/home/projects/22126_NGS/exercises/transcriptomics/03_coad_gs_ora.Rdata&amp;amp;#39;)&lt;br /&gt;
# loads the gs_ora object&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;The following code &amp;lt;em&amp;gt;example&amp;lt;/em&amp;gt; extract the ORA analysis of&lt;br /&gt;
either DGU and DGE and sorts it so the most significant gene-sets are at&lt;br /&gt;
the top.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### DGE:&lt;br /&gt;
dge_ora_sorted &amp;amp;lt;- gi_paired_ora[&lt;br /&gt;
    sort.list(gi_paired_ora$pval_deseq),                 # sort part&lt;br /&gt;
    c(&amp;amp;#39;pathway&amp;amp;#39;,&amp;amp;#39;pval_deseq&amp;amp;#39;,&amp;amp;#39;enrichment_score_deseq&amp;amp;#39;)   # select part&lt;br /&gt;
]&lt;br /&gt;
&lt;br /&gt;
### DGU ORA:&lt;br /&gt;
dgu_ora_sorted &amp;amp;lt;- gi_paired_ora[&lt;br /&gt;
    sort.list(gi_paired_ora$pval_dexseq),                # sort part&lt;br /&gt;
    c(&amp;amp;#39;pathway&amp;amp;#39;,&amp;amp;#39;pval_dexseq&amp;amp;#39;,&amp;amp;#39;enrichment_score_dexseq&amp;amp;#39;) # select part&lt;br /&gt;
]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Question&amp;lt;/strong&amp;gt;: Look at the 10-15 most significant gene&lt;br /&gt;
sets from both analyses. What are the similarities and differences?&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Answer&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### DGE:&lt;br /&gt;
dge_ora_sorted &amp;amp;lt;- cin_ora[&lt;br /&gt;
    sort.list(cin_ora$pval_deseq),                 # sort part&lt;br /&gt;
    c(&amp;amp;#39;pathway&amp;amp;#39;,&amp;amp;#39;pval_deseq&amp;amp;#39;,&amp;amp;#39;enrichment_score_deseq&amp;amp;#39;)   # select part&lt;br /&gt;
]&lt;br /&gt;
&lt;br /&gt;
head(dge_ora_sorted, 15)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
##                                         pathway   pval_deseq&lt;br /&gt;
## 3823                   REACTOME_RRNA_PROCESSING 3.694775e-19&lt;br /&gt;
## 4433  GOBP_RIBONUCLEOPROTEIN_COMPLEX_BIOGENESIS 4.453501e-17&lt;br /&gt;
## 3879                   GOBP_RIBOSOME_BIOGENESIS 5.320229e-16&lt;br /&gt;
## 3785                              KEGG_RIBOSOME 2.192710e-14&lt;br /&gt;
## 1061            GOBP_MITOTIC_CELL_CYCLE_PROCESS 1.962214e-13&lt;br /&gt;
## 4700                       HALLMARK_E2F_TARGETS 2.038376e-13&lt;br /&gt;
## 977                         REACTOME_CELL_CYCLE 2.567524e-13&lt;br /&gt;
## 3759 REACTOME_EUKARYOTIC_TRANSLATION_ELONGATION 3.350223e-13&lt;br /&gt;
## 3828       REACTOME_SELENOAMINO_ACID_METABOLISM 4.766866e-13&lt;br /&gt;
## 4598                    HALLMARK_G2M_CHECKPOINT 7.966641e-13&lt;br /&gt;
## 3923 REACTOME_EUKARYOTIC_TRANSLATION_INITIATION 3.734833e-12&lt;br /&gt;
## 864                              GOCC_NUCLEOLUS 6.125940e-12&lt;br /&gt;
## 747                 REACTOME_INFECTIOUS_DISEASE 7.879724e-12&lt;br /&gt;
## 425    GOBP_RESPONSE_TO_ORGANIC_CYCLIC_COMPOUND 8.207220e-12&lt;br /&gt;
## 2449                    GOCC_ANCHORING_JUNCTION 9.689453e-12&lt;br /&gt;
##      enrichment_score_deseq&lt;br /&gt;
## 3823              0.6239502&lt;br /&gt;
## 4433              0.4724997&lt;br /&gt;
## 3879              0.5166409&lt;br /&gt;
## 3785              0.7224417&lt;br /&gt;
## 1061              0.3588459&lt;br /&gt;
## 4700              0.5451123&lt;br /&gt;
## 977               0.3721868&lt;br /&gt;
## 3759              0.6923103&lt;br /&gt;
## 3828              0.6601218&lt;br /&gt;
## 4598              0.5398054&lt;br /&gt;
## 3923              0.6205530&lt;br /&gt;
## 864               0.3070925&lt;br /&gt;
## 747               0.3249413&lt;br /&gt;
## 425               0.3294544&lt;br /&gt;
## 2449              0.3259827&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;DGE: something with RIBOSOME and CELL_CYCLE&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;&lt;br /&gt;
### DGU ORA:&lt;br /&gt;
dgu_ora_sorted &amp;lt;- cin_ora[&lt;br /&gt;
    sort.list(cin_ora$pval_dexseq),                # sort part&lt;br /&gt;
    c(&amp;amp;#39;pathway&amp;amp;#39;,&amp;amp;#39;pval_dexseq&amp;amp;#39;,&amp;amp;#39;enrichment_score_dexseq&amp;amp;#39;) # select part&lt;br /&gt;
]&lt;br /&gt;
head(dgu_ora_sorted, 15)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
##                                                                  pathway&lt;br /&gt;
## 2757                                   GOBP_ACTIN_FILAMENT_BASED_PROCESS&lt;br /&gt;
## 2449                                             GOCC_ANCHORING_JUNCTION&lt;br /&gt;
## 2787          REACTOME_SIGNALING_BY_RHO_GTPASES_MIRO_GTPASES_AND_RHOBTB3&lt;br /&gt;
## 3180                   GOMF_NUCLEOSIDE_TRIPHOSPHATASE_REGULATOR_ACTIVITY&lt;br /&gt;
## 2259                                      GOMF_ENZYME_REGULATOR_ACTIVITY&lt;br /&gt;
## 2345                                   GOMF_CYTOSKELETAL_PROTEIN_BINDING&lt;br /&gt;
## 2682 GOMF_TRANSFERASE_ACTIVITY_TRANSFERRING_PHOSPHORUS_CONTAINING_GROUPS&lt;br /&gt;
## 3363        GOBP_REGULATION_OF_SMALL_GTPASE_MEDIATED_SIGNAL_TRANSDUCTION&lt;br /&gt;
## 2045                                         GOCC_SUPRAMOLECULAR_COMPLEX&lt;br /&gt;
## 2806                                GOMF_PROTEIN_DOMAIN_SPECIFIC_BINDING&lt;br /&gt;
## 2869                      GOBP_SMALL_GTPASE_MEDIATED_SIGNAL_TRANSDUCTION&lt;br /&gt;
## 1781                      GOBP_POSITIVE_REGULATION_OF_CATALYTIC_ACTIVITY&lt;br /&gt;
## 3047                                    WP_VEGFAVEGFR2_SIGNALING_PATHWAY&lt;br /&gt;
## 2377                              GOBP_ORGANOPHOSPHATE_METABOLIC_PROCESS&lt;br /&gt;
## 2072                                             GOBP_CELL_MORPHOGENESIS&lt;br /&gt;
##       pval_dexseq enrichment_score_dexseq&lt;br /&gt;
## 2757 3.504255e-16               0.7528919&lt;br /&gt;
## 2449 7.263291e-15               0.7065107&lt;br /&gt;
## 2787 1.728464e-14               0.7489251&lt;br /&gt;
## 3180 1.837683e-14               0.8509545&lt;br /&gt;
## 2259 2.393632e-14               0.6135585&lt;br /&gt;
## 2345 4.224209e-14               0.6584802&lt;br /&gt;
## 2682 1.400344e-13               0.6628826&lt;br /&gt;
## 3363 3.661677e-13               0.9806811&lt;br /&gt;
## 2045 4.692953e-13               0.5934650&lt;br /&gt;
## 2806 2.857812e-12               0.7084235&lt;br /&gt;
## 2869 3.593536e-12               0.7767232&lt;br /&gt;
## 1781 5.678506e-12               0.5736414&lt;br /&gt;
## 3047 5.845276e-12               0.8127997&lt;br /&gt;
## 2377 6.168410e-12               0.6300335&lt;br /&gt;
## 2072 1.251178e-11               0.6002377&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;DGU: something with ACTIN, JUNCTION and SIGNALING&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;step-9-visual-inspection-of-ora-result&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h2&amp;gt;Step 9: Visual inspection of ORA result&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Question&amp;lt;/strong&amp;gt;: Based on your insights from step 8 use the &amp;#039;plot_ora()&amp;#039; functionality to test if these are just examples or generalize to all the significant results. An example: If I from the 10-15 top gene-sets saw that only DGU had gene-sets covering “telomer” function I would use the &amp;#039;plot_ora()&amp;#039; function to test this.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Answer&amp;lt;/strong&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;&lt;br /&gt;
plot_ora(&lt;br /&gt;
    ora=cin_ora,&lt;br /&gt;
    plotly = FALSE,&lt;br /&gt;
    pattern = &amp;amp;quot;CELL_CYCLE&amp;amp;quot;, # Identify all gene sets about telomeres&lt;br /&gt;
    cutoff = 0.1, # Only include significant gene sets&lt;br /&gt;
    lines = TRUE, # Guide lines&lt;br /&gt;
    colors = c(&amp;amp;#39;red&amp;amp;#39;,&amp;amp;#39;blue&amp;amp;#39;,&amp;amp;#39;black&amp;amp;#39;)&lt;br /&gt;
)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:Rnaseq_fig1.png]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;Looks like cell cycle changes are mediated by both (enrichment is on the diagonal) and the majority is significant for both DGE and DGU.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
plot_ora(&lt;br /&gt;
    ora=cin_ora,&lt;br /&gt;
    plotly = FALSE,&lt;br /&gt;
    pattern = &amp;amp;quot;RIBOSOME&amp;amp;quot;, # Identify all gene sets about telomeres&lt;br /&gt;
    cutoff = 0.33, # Only include significant gene sets&lt;br /&gt;
    lines = TRUE, # Guide lines&lt;br /&gt;
    colors = c(&amp;amp;#39;red&amp;amp;#39;,&amp;amp;#39;blue&amp;amp;#39;,&amp;amp;#39;black&amp;amp;#39;)&lt;br /&gt;
)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;&lt;br /&gt;
[[File:Rnaseq_fig2.png]]&lt;br /&gt;
&lt;br /&gt;
Ribosome is clearly mainly significant for DGE.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
plot_ora(&lt;br /&gt;
    ora=cin_ora,&lt;br /&gt;
    plotly = FALSE,&lt;br /&gt;
    pattern = &amp;amp;quot;ACTIN&amp;amp;quot;, # Identify all gene sets about telomeres&lt;br /&gt;
    cutoff = 0.33, # Only include significant gene sets&lt;br /&gt;
    lines = TRUE, # Guide lines&lt;br /&gt;
    colors = c(&amp;amp;#39;red&amp;amp;#39;,&amp;amp;#39;blue&amp;amp;#39;,&amp;amp;#39;black&amp;amp;#39;)&lt;br /&gt;
)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:Rnaseq_fig3.png]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;Although many actin-related pathways are significant for both DGU and DGE more are DGU. Also, the enrichment among DGU is more pronounced (points are to the right of the diagonal line).&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;br&amp;gt;&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;Lastly, note the low correlation suggesting an overall low similarity in biological signaling mediated through DGE and DGU.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Question&amp;lt;/strong&amp;gt;: Try to make a hypothesis as to why this/these molecular functions might be important for cancer.&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Answer&amp;lt;/strong&amp;gt;:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;CELL_CYCLE: One of the main hallmarks of cancer - uncontrolled cell division.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;RIBOSOME: Many ribosomes are needed when cells are dividing (as indicated by increased cell cycle).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;ACTIN: Actin is involved in cell movement and thereby cancer invasion and metastasis.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;step-10-critical-self-evaluation&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h2&amp;gt;Step 10: Critical self evaluation&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Question&amp;lt;/strong&amp;gt;: Take a moment to think about what potential problems there could be with this assignment. Are there any obvious things we have not taken into consideration?&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p&amp;gt;&amp;lt;strong&amp;gt;Answer&amp;lt;/strong&amp;gt;: The main problems are:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;ol style=&amp;quot;list-style-type: decimal&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;More QC should have been done (clustering, outliers, etc)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;This is only a subset of the data (the real dataset has ~300 cancer samples)&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;We do not take co-factors into account. How many of the effects are due to e.g. gender and age differences?&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ol&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;step-11-repport-result&amp;quot; class=&amp;quot;section level2&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;h2&amp;gt;Step 11: Report result&amp;lt;/h2&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Go to the blackboard and report one or more of the following:&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;A keyword that showed a similar enrichment pattern in DGU and DGE&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;A keyword that showed preferential regulation through DGU or DGE&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;hr/&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div id=&amp;quot;bonus-assignment&amp;quot; class=&amp;quot;section level1&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;h1&amp;gt;Bonus Assignment&amp;lt;/h1&amp;gt;&lt;br /&gt;
&amp;lt;p&amp;gt;Use &amp;#039;pairedGSEA&amp;#039; to analyze the other COAD cancer subtype (the one you did not analyze). Are the gene-sets similar or different between the subtypes and analysis types?&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
</feed>