Gabre at 13:11, 14 January 2025

2025-01-14T13:11:41Z

← Older revision		Revision as of 15:11, 14 January 2025
Line 1:		Line 1:

	<H3>Overview</H3>		<H3>Overview</H3>
	If you need to use metagenomics for your final project, we have a more thorough workflow that you will need to use [[https://teaching.healthtech.dtu.dk/22136/index.php/22136:Course_plan_autumn_2020 here]].


	Since metagenomics data is often very large, it requires a lot of computational resources and time, we have cheated a little bit and prepared some data for you in advance!		Since metagenomics data is often very large, it requires a lot of computational resources and time, we have cheated a little bit and prepared some data for you in advance!

	In this exercise we have done the assembly and counting across a cohort of hundreds of human fecal		In this exercise, we have done the assembly and counting across a cohort of hundreds of human fecal
	samples in advance and in addition provide the gene-wise taxonomy and the BMI of the		samples in advance and in addition provide the gene-wise taxonomy and the BMI of the
	human donors.		human donors.
	From this data we shall estimate the species richness, diversity and look at the effect of		From this data, we shall estimate the species richness,and diversity and look at the effect of
	downsizing. Furthermore we shall see if we can identify any differences between the		downsizing. Furthermore, we shall see if we can identify any differences between the
	microbiome of lean and obese.		microbiome of lean and obese.

Gabre at 11:57, 13 January 2025

2025-01-13T11:57:41Z

← Older revision		Revision as of 13:57, 13 January 2025
Line 20:		Line 20:
	Now, let’s load the "vegan" package and thereafter load the read count data from a series of stool samples.		Now, let’s load the "vegan" package and thereafter load the read count data from a series of stool samples.
	<pre>library("vegan")		<pre>library("vegan")
	load(url("~~http~~://teaching.healthtech.dtu.dk/material/22126/Counts_NGS.RData"))		load(url("https://teaching.healthtech.dtu.dk/material/22126/Counts_NGS.RData"))
	head(Counts)		head(Counts)
	str(Counts)</pre>		str(Counts)</pre>
Line 35:		Line 35:
	<H3>Species</H3>		<H3>Species</H3>
	Lets get the genes associated to species. Here is the gene-wise species taxonomy		Lets get the genes associated to species. Here is the gene-wise species taxonomy
	<pre>load(url("~~http~~://teaching.healthtech.dtu.dk/material/22126//taxonomy_species.RData"))		<pre>load(url("https://teaching.healthtech.dtu.dk/material/22126//taxonomy_species.RData"))
	head(taxonomy_species)</pre>		head(taxonomy_species)</pre>
	We then combine (by summing) the read counts pr. gene to read counts per species.		We then combine (by summing) the read counts pr. gene to read counts per species.
Line 150:		Line 150:

	<pre>		<pre>
	> load(url("~~http~~://teaching.healthtech.dtu.dk/material/22126/BMI.RData"))		> load(url("https://teaching.healthtech.dtu.dk/material/22126/BMI.RData"))
	> boxplot(BMI$BMI.kg.m2 ~ BMI$Class, col=c("red", "gray","blue"), ylab="BMI")		> boxplot(BMI$BMI.kg.m2 ~ BMI$Class, col=c("red", "gray","blue"), ylab="BMI")
	</pre>		</pre>

WikiSysop: Created page with "

Overview

If you need to use metagenomics for your final project, we have a more thorough workflow that you will need to use https://teaching.healthtech.dtu.dk/22136/index.php/22136:Course_plan_autumn_2020 here. Since metagenomics data is often very large, it requires a lot of computational resources and time, we have cheated a little bit and prepared some data for you in advance! In this exercise we have done the assembly and counting across a cohort of..."

2024-03-19T15:45:53Z

Created page with " <H3>Overview</H3> If you need to use metagenomics for your final project, we have a more thorough workflow that you will need to use https://teaching.healthtech.dtu.dk/22136/index.php/22136:Course_plan_autumn_2020 here. Since metagenomics data is often very large, it requires a lot of computational resources and time, we have cheated a little bit and prepared some data for you in advance! In this exercise we have done the assembly and counting across a cohort of..."

New page

<H3>Overview</H3>
If you need to use metagenomics for your final project, we have a more thorough workflow that you will need to use [[https://teaching.healthtech.dtu.dk/22136/index.php/22136:Course_plan_autumn_2020 here]].

Since metagenomics data is often very large, it requires a lot of computational resources and time, we have cheated a little bit and prepared some data for you in advance!

In this exercise we have done the assembly and counting across a cohort of hundreds of human fecal
samples in advance and in addition provide the gene-wise taxonomy and the BMI of the
human donors.
From this data we shall estimate the species richness, diversity and look at the effect of
downsizing. Furthermore we shall see if we can identify any differences between the
microbiome of lean and obese.

<H3>Becoming a pirate</H3>
This exercise uses R either locally (install RStudio on your own machine) or on the server by typing
<pre>R</pre>
First, IF you are running RStudio locally you will need to install a package called "vegan"
<pre>install.packages("vegan")</pre>
Now, let’s load the "vegan" package and thereafter load the read count data from a series of stool samples.
<pre>library("vegan")
load(url("http://teaching.healthtech.dtu.dk/material/22126/Counts_NGS.RData"))
head(Counts)
str(Counts)</pre>
'''Q1. How many samples do we have and how many genes?'''

The different samples may have been sequenced to different depths. Try to count the reads per sample
<pre>
sampleDepth<-(colSums(Counts))
hist(sampleDepth, breaks=100, ylab="Number of samples", xlab="Number of reads", main="Sample depth")
range(sampleDepth)
</pre>

'''Q2. Whats the sample depth range?'''
<H3>Species</H3>
Lets get the genes associated to species. Here is the gene-wise species taxonomy
<pre>load(url("http://teaching.healthtech.dtu.dk/material/22126//taxonomy_species.RData"))
head(taxonomy_species)</pre>
We then combine (by summing) the read counts pr. gene to read counts per species.
<pre>taxCounts<-apply(Counts, 2, tapply, INDEX=taxonomy_species, sum)</pre>
Try looking at the taxCounts matrix
<pre>str(taxCounts)
head(taxCounts)</pre>
'''Q3. How many species are there in total?'''
<H3>Richness and Diversity</H3>
What is the species richness and diversity (Shannon) for the different samples.

'''Q4. What does a high Shannon diversity index mean?'''

OK, lets see it for our samples

<pre>
species_richness<-(colSums(taxCounts>0))
names(species_richness)<-NULL
require(vegan)
speciesDiversity<-diversity(t(taxCounts), index = "shannon")
names(speciesDiversity)<-NULL
par(mfrow=c(1,1))
barplot(sort(species_richness), las=3, main="Species richness", xlab="Samples", ylab="Richness")
barplot(sort(speciesDiversity), xlab="Samples", las=3, main="Diversity (Shannon)")
plot(species_richness,speciesDiversity,xlab="Richness", ylab="Shannon diversity index")
</pre>
[[File:raw_richness.png]][[File:raw_shannon.png]][[File:raw_richnessVSshannonZoom.png]]

Each samples or persons richness and diversity is shown and the third plot shows each sample/persons richness & diversity as a dot.
<H3>Downsizing or rarefying</H3>
But this was on the raw count data with different sampling depth (number of counts) per sample. We should downsize so that we get fair comparisons.

First suggest the number of reads we should sample per sample for the downsizing [target]. If we chose a low target we will loose abundance resolution and detection sensitivity. If we chose it higher we will loose samples.
<pre>> plot(sampleDepth, pch=20, log="y", xlab="Samples", ylab="Number of reads")</pre>
[[File:raw_sampledepth.png]]

There is no right answer (but there are less good suggestions). Insert the number you want to downsize to below and plot it again - the samples above the horizontal line we will keep and the samples below the line we will throw out.

<pre>
> downsizeTarget <- INSERT NUMBER
> plot(sampleDepth, pch=20, log="y", xlab="Samples", ylab="Number of reads"); abline(h=downsizeTarget)
</pre>
[[File:downsized_sampledepth.png]]

'''Q5. Which threshold did you chose and why? How many samples did you loose?'''

OK lets downsize
<pre>
> dz_Counts<-round(t(t(Counts)*downsizeTarget/sampleDepth))
> weak_samples<-sampleDepth<downsizeTarget
> dz_Counts[,weak_samples]<-NA # samples that did not make the cut
</pre>

This is a quick and dirty downsizing (ideally one resampled the reads to a given depth, but that will take days)
Count the species again, now on the downsized data.

<pre>
dz_taxCounts<-apply(dz_Counts, 2, tapply, INDEX=taxonomy_species, sum); gc()
</pre>

And the richness and diversity again, now on downsized data

<pre>
> dz_species_richness<-(colSums(dz_taxCounts>0))
> names(dz_species_richness)<-NULL
> require(vegan)
> dz_speciesDiversity<-diversity(t(dz_taxCounts), index = "shannon")
> dz_speciesDiversity[weak_samples]<-NA
> names(dz_speciesDiversity)<-NULL
</pre>

Now plot the richness and diversity with downsized data

<pre>
> par(mfrow=c(1,1), pch=1)
> barplot(sort(dz_species_richness), las=3, main="Species richness (Downsized)", xlab="Species", ylab="Richness")
</pre>
[[File:downsized_richness.png]]
<pre>
barplot(sort(dz_speciesDiversity), las=3,main="Shannon's diversity index (downsized)", xlab="Species", ylab="Shannon diversity")
</pre>
[[File:downsized_shannon.png]]

And compare to the raw data

<pre>
> plot(dz_species_richness,species_richness, xlab="downsized richness", ylab="raw richness", main="Richness")
</pre>
[[File:Comparing_richness.png]]
<pre>
> plot(dz_speciesDiversity,speciesDiversity,xlab="downsized species diversity", ylab="raw species diversity",main="Diversity (Shannon)")
</pre>
[[File:Comparing_shannon.png]]

'''Q6. What is the effect on the downsizing on richness
'''

'''Q7. What is the effect on the downsizing on diversity (shannon)'''

Lets plot the abundance of each species in a sample with low diversity and a sample with high diversity. You should be able to see a clear difference between the two samples!

<pre>
> par(mfrow=c(1,2))
> barplot(taxCounts[,4], main="Person 4, SD > 3", xaxt="n", xlab="Species", ylab="Normalized abundance")
> barplot(taxCounts[,240], main="Person 240, SD < 0.5", xaxt="n", xlab="Species", ylab="Normalized abundance")
> par(mfrow=c(1,1))
</pre>

[[File:comparing_species_abundance.png]]

<H3>Comparisons</H3>

Now lets see if there is a difference between the microbiome of lean and obese humans. But first load some sample more information: BMI and Class.

<pre>
> load(url("http://teaching.healthtech.dtu.dk/material/22126/BMI.RData"))
> boxplot(BMI$BMI.kg.m2 ~ BMI$Class, col=c("red", "gray","blue"), ylab="BMI")
</pre>
[[File:bmi_class.png]]

Class are: le = Lean; ow = Overweight; ob = Obese

First let us see if the abundance of E. coli differs between obese and lean individuals using a Wilcoxon rank sum test (look for the p-value in the output), also lets get the mean abundance of E. coli in the tree groups :

<pre>
> wilcox.test(x=dz_taxCounts["Escherichia coli",BMI$Classification=="ob"], y=dz_taxCounts["Escherichia coli",BMI$Classification=="le"] )
> tapply(dz_taxCounts["Escherichia coli",], BMI$Classification, mean, na.rm=TRUE)
</pre>

'''Q8. Is there any significant difference in abundance of E. coli between the different BMI groups?'''

Let's test all species correcting for multiple testing using Benjamini-Hochberg (False Discovery Rate) (we are testing 120 species) and plot them:

<pre>
> pval<-apply(dz_taxCounts, 1, function(V){wilcox.test(x=V[BMI$Classification=="ob"],y=V[BMI$Classification=="le"])$p.value})
> Abundance_ratio<-log2(apply(dz_taxCounts, 1,function(V){mean(x=V[BMI$Classification=="ob"], na.rm=TRUE)/mean(V[BMI$Classification=="le"], na.rm=TRUE)}))
> pval.adjust = p.adjust(pval, method="BH")
> plot(sort(pval.adjust), log="y", pch=16, xlab="Species", ylab="p-values")
> abline(h=0.05, col="grey", lty=2)
</pre>

'''Q9. How many species are significant with an false discovery rate < 0.05?'''

Let us look at the top 10 most significant species abundance.

<pre>
> o<-order(pval)
> BMIstat<-data.frame(pval,pval.adjust, Abundance_ratio)[o,]
> BMIstat[1:10,]
> par(mar=c(5,18,5,5))
> barplot(BMIstat[1:10,3], names.arg=rownames(BMIstat)[1:10], las=1,xlab="log fold difference between lean and obese", horiz=TRUE)
</pre>

[[File:log_fold_diff_sign.png]]

'''Q10. Can you see any differences in the abundances - which species have large differences, what are their p-values?'''

'''Q11. What type of bacteria is the most significant one? [try google]'''

<H3>Beta-diversity and PCA</H3>

Plot the Bray-curtis distance between samples as a heatmap.

<pre>
library(RColorBrewer)
library(gplots)
vdist = as.matrix(vegdist(t(taxCounts)))
rownames(vdist) = colnames(vdist)
hmcol = colorRampPalette(brewer.pal(9, "GnBu"))(100)
heatmap.2(vdist, trace='none', col=rev(hmcol))
</pre>

'''Q12. Can you see some clusters of samples?'''

Finally for the PCA:

<pre>
> my.rda <- rda(t(taxCounts))
> biplot(my.rda, display = c("sites", "species"), type = c("text", "points"))
</pre>

'''Q13. Can you see which species that seems to be driving the differences between the samples?'''

<H3>Statistically modelling the variance using DESeq2</H3>

Now, we will see the power of statistically modelling the variance instead of downsizing.

<pre>
> if (!requireNamespace("BiocManager", quietly = TRUE))
> install.packages("BiocManager")
> BiocManager::install("DESeq2")
> library(DESeq2)
> cts <- taxCounts
> coldata = BMI[,1]
> coldata = matrix(NA, nrow=nrow(BMI), ncol=1)
> coldata[,1] = as.vector(BMI[,1])
> rownames(coldata) = rownames(BMI)
> colnames(coldata) = "BMI"
</pre>

Take a look at coldata

<pre>
coldata
</pre>

Make sure that all individuals are in our coldata (information) and also in the data is true

<pre>
all(rownames(coldata) == colnames(cts))
</pre>

Load data into DESeq format, perform statistical analysis and get results

<pre>
> dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ BMI)
> dds <- DESeq(dds)
> res <- results(dds)
> res
</pre>

Order the results according to the adjusted p-value and show the most significant

<pre>
> resOrdered <- res[order(res$pvalue),]
> head(resOrdered)
</pre>

'''Q14. which are the most significant species (google)? Is there an overlap between these and using downsizing+wilcoxon test (what you did above)?'''

Please find answers [[QuantitativeMetagenomicsSolution|here]]

QuantitativeMetagenomics - Revision history

Gabre at 13:11, 14 January 2025

Gabre at 11:57, 13 January 2025

WikiSysop: Created page with "

Overview