<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://teaching.healthtech.dtu.dk/22115/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=WikiSysop</id>
	<title>22115 - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://teaching.healthtech.dtu.dk/22115/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=WikiSysop"/>
	<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php/Special:Contributions/WikiSysop"/>
	<updated>2026-05-02T18:40:16Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=22115_-_Computational_Molecular_Evolution&amp;diff=42</id>
		<title>22115 - Computational Molecular Evolution</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=22115_-_Computational_Molecular_Evolution&amp;diff=42"/>
		<updated>2024-03-19T14:00:11Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;; Overview  [[File:Darwin logo2 medium.png |right|border|550px]]&lt;br /&gt;
: This page contains links to video lectures, computer exercises, and other material for the course [https://kurser.dtu.dk/course/22115 22115 - Computational Molecular Evolution], which is part of the [https://www.dtu.dk/english/education/msc/programmes/systems_biology MSc in Bioinformatics and Systems Biology] at the [https://www.dtu.dk/english Technical University of Denmark]. The course is taught by Professor Anders Gorm Pedersen, [https://www.healthtech.dtu.dk/english/Research/Research-Sections/Section-Bioinformatics Section for Bioinformatics], [https://www.healthtech.dtu.dk/english Department of Health Technology].&lt;br /&gt;
&lt;br /&gt;
: The main goal of this course is to give an introduction to theory and algorithms in the field of computational molecular evolution. We will cover basic evolutionary theory (common descent, natural selection, genetic drift, models of growth and selection), and the main types of algorithms used for constructing and analyzing phylogenetic trees (parsimony, distance based methods, maximum likelihood methods, and Bayesian inference). We will also discuss the role of statistical modeling in science more generally&lt;br /&gt;
&lt;br /&gt;
:The course will consist of lectures, computer exercises, and mini-projects. The student will acquire practical experience in the use of a range of computational methods by analyzing sequences from the scientific literature.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
==&#039;&#039;&#039;Computer setup&#039;&#039;&#039;==&lt;br /&gt;
&lt;br /&gt;
===Linux===&lt;br /&gt;
:* [[Linux software installation]]&lt;br /&gt;
&amp;lt;!-- :* [[Notes on using Linux for exercises]] &#039;&#039;&#039;UNDER CONSTRUCTION&#039;&#039;&#039; --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Windows===&lt;br /&gt;
:* [[Windows software installation]]&lt;br /&gt;
&amp;lt;!--:* [[Notes on using Windows for exercises]] &#039;&#039;&#039;UNDER CONSTRUCTION&#039;&#039;&#039; --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===MacOS===&lt;br /&gt;
:* [[MacOS software installation]]&lt;br /&gt;
&amp;lt;!-- :* [[Notes on using MacOS for exercises]] &#039;&#039;&#039;UNDER CONSTRUCTION&#039;&#039;&#039; --&amp;gt;&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
===VirtualBox===&lt;br /&gt;
:* Use this only if you can&#039;t install natively on MacOS, Windows, or Linux. Runs a virtual Linux on top of your own OS.&lt;br /&gt;
:* [[VirtualBox installation]]&lt;br /&gt;
:* [[Notes on using VirtualBox for exercises]]&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== &#039;&#039;&#039;Lecture Schedule&#039;&#039;&#039; ==&lt;br /&gt;
&lt;br /&gt;
:([[27615 Previous course programs|Course programs, previous years]])&lt;br /&gt;
&lt;br /&gt;
===Week 1 (January 31): Introduction to evolutionary theory and population genetics. Models of growth, selection and mutation===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/okjVaLA5S38 Common descent (11:52)]&lt;br /&gt;
:* [https://youtu.be/VkkIu1ZtaIE Natural selection (14:57)]&lt;br /&gt;
:* [https://youtu.be/wqa6W3_WW7s Evidence for evolution (part 1) (9:34)] &lt;br /&gt;
:* [http://y2u.be/_-a-F8egAis Evidence for evolution (part 2) (20:54)]&lt;br /&gt;
:* [http://y2u.be/AUGbSMWPILE Population growth and selection (18:13)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [https://github.com/agormp/evolintro/blob/main/evolintro.pdf Lecture notes on evolutionary theory and population genetics]&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/slides_week1.pdf Slides, week 1]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Population Growth, Fitness, and Selection]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 2 (February 7): Neutral mutations and genetic drift. Tree reconstruction by parsimony===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/cQVjL50kK0k Neutral Theory of Molecular Evolution (11:28)]&lt;br /&gt;
:* [https://youtu.be/J8LDUFm4ttA Genetic Drift (9:35)]&lt;br /&gt;
:* [https://youtu.be/AZkHWdl2oAw Trees: Terminology and Representation (9:41)]&lt;br /&gt;
:* [https://youtu.be/zCj1s9fmaKs Homology and Homoplasy (9:07)]&lt;br /&gt;
:* [https://youtu.be/gXb_WuLCD8g Maximum Parsimony (7:48)]&lt;br /&gt;
:* [https://youtu.be/Q7ZpdPCx0uQ The Fitch Algorithm (10:31)]&lt;br /&gt;
:* [https://youtu.be/deywW9wJXmw Searching Tree Space (14:01)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/slides_week2.pdf Slides, week 2]&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/Paup_Doc_31.pdf PAUP 3.1 manual (note: for older version - contains explanations of parsimony and tree moves)]&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/PAUP4-manual.pdf PAUP 4beta command reference]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Phylogenetic Analysis using Parsimony]]&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 3 (February 14): Consensus trees. Distance matrix methods===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://www.youtube.com/watch?v=YXZZyu9OAcg Consensus Trees (16:27)]&lt;br /&gt;
:* [https://www.youtube.com/watch?v=MhjSSxcGjaY Distance Matrix Methods, part 1 (6:07)]&lt;br /&gt;
:* [https://www.youtube.com/watch?v=PNoUcQTCxiM Distance Matrix Methods, part 2 (22:28)]&lt;br /&gt;
:* [https://www.youtube.com/watch?v=Dj24mCLQYUE Neighbour Joining (15:28)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/Consensus.pdf Handout exercise: Consensus Trees]&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/Distance_handout.pdf Handout exercise: Distance Matrix Methods]&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/Slides_week3.pdf Slides, week 3]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercises&lt;br /&gt;
:* [[Consensus Trees]] &lt;br /&gt;
:* [[Distance Matrix Methods]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 4+5 (February 21 + 28): Mini project 1===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;--&lt;br /&gt;
&lt;br /&gt;
Project description: [https://teaching.healthtech.dtu.dk/material/22115/Miniproject1_whales.pdf Building a tree from scratch: What are the closest relatives of whales?]&lt;br /&gt;
&lt;br /&gt;
The mini project should be submitted and assessed via a peer assessment module that will become available on the course DTU Learn page.&lt;br /&gt;
&lt;br /&gt;
Take this tree quiz to  test yourself on your ability to accurately interpret evolutionary trees: &lt;br /&gt;
* [https://teaching.healthtech.dtu.dk/material/22115/Treequiz1.pdf Tree quiz]&lt;br /&gt;
Check your replies here:  &lt;br /&gt;
* [https://teaching.healthtech.dtu.dk/material/22115/Treequiz1_answers.pdf Tree quiz with answers] &lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 6 (March 6): Models of sequence evolution. Likelihood methods===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/ro2MFmVZypQ Models of evolution (28:48)]&lt;br /&gt;
:* [https://youtu.be/xDKUIegYpWM Maximum likelihood (22:06)]&lt;br /&gt;
:* [https://youtu.be/Siau2o_egGI Ancestral reconstruction (10:45)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/Handout_real_exp_change.pdf|Handout exercise: Real, Observed, and Expected Change]&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/Handout_likelihood.pdf Handout exercise: Computation of Likelihood]&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/Slides_week4.pdf Slides, week 6]&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/substitutionmodels.pdf Lecture notes: Substitution models]&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/main.pdf Optional lecture notes: Matrix exponentials for Markov chains]&lt;br /&gt;
:; Computer exercises&lt;br /&gt;
:* [[Models of Evolution]]&lt;br /&gt;
:* [[Maximum Likelihood]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 7 (March 13): Bayesian inference of phylogeny===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://www.youtube.com/watch?v=DI3TIx78NqM&amp;amp;t=12s Bayesian Inference (23:48)]&lt;br /&gt;
:* [https://youtu.be/uyG5DVigEyM?list=PLXwjzs_mabFrlRF7uALEomfGGckG0sG5y Markov chain Monte Carlo (19:54)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/Handout.class08.pdf Handout exercise: Bayesian estimation of model parameter value]&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/Slides_week5.pdf Slides, week 7]&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/MTN122.pdf| An Introduction to Bayesian Statistics Without Using Equations]&lt;br /&gt;
:* [http://www.nature.com/nbt/journal/v22/n9/pdf/nbt0904-1177.pdf Background reading: &amp;quot;What is Bayesian statistics?&amp;quot;]&lt;br /&gt;
:* [http://rsta.royalsocietypublishing.org/content/roypta/361/1813/2681.full.pdf Background reading: &amp;quot;Bayesian computation: a statistical revolution&amp;quot;]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Bayesian Phylogeny]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 8+9 (March 20 + April 3): Mini project 2===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&#039;&#039;&#039;Maximum likelihood and R-based phylogenetics - The origin of the Lassa and SARS-CoV-2 virus epidemics&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Project description and data sets&#039;&#039;&#039;: See DTU Learn page &lt;br /&gt;
&lt;br /&gt;
The mini project should be submitted and assessed via the peergrade interface at DTU Learn.&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 10 (April 10): Model Selection===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/sJB2LmppZj8?list=PLXwjzs_mabFrlRF7uALEomfGGckG0sG5y Model selection, part 1 (15:19)]&lt;br /&gt;
:* [https://youtu.be/qSoDZ_33GbM Model selection, part 2 (17:20)]&lt;br /&gt;
:* [https://youtu.be/YYoo1vUO4ME Introduction to computer exercise: detection of selection (15:24)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/Slides_week6.pdf Slides, week 10]&lt;br /&gt;
:* [https://github.com/ddarriba/jmodeltest2/files/157130/manual.pdf jmodeltest manual]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Model selection]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 11 (April 17): Bayesian Phylogenetics, Part 2 ===&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [https://www.researchgate.net/publication/319965471_A_biologist%27s_guide_to_Bayesian_phylogenetic_analysis A biologist’s guide to Bayesian phylogenetic analysis]&lt;br /&gt;
:* [https://beast.community/analysing_beast_output Analysing BEAST output using Tracer]&lt;br /&gt;
:* [https://beast.community/tracer_convergence Identifying convergence problems using Tracer]&lt;br /&gt;
:* [https://taming-the-beast.org/tutorials/Troubleshooting/ Post-processing and improving performance]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Bayesian phylogenetics: checking convergence]] &lt;br /&gt;
:* [[Bayesian phylogenetics: clock models]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 12 + 13 (April 24 + May 1): Mini project 3: Final exam===&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Details will follow&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
----&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Bayesian_phylogenetics:_clock_models&amp;diff=41</id>
		<title>Bayesian phylogenetics: clock models</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Bayesian_phylogenetics:_clock_models&amp;diff=41"/>
		<updated>2024-03-19T13:48:12Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;This exercise is part of the course  Computational Molecular Evolution (22115).  == Overview ==  In this exercise we will explore how to use the software tool BEAST2 to construct phylogenies based on molecular-clock models. In previous exercises we have worked with phylogenies where we did not have information about how fast sequences were evolving, and we therefore used the number of substitutions as branch lengths. When the...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This exercise is part of the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]].&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
In this exercise we will explore how to use the software tool BEAST2 to construct phylogenies based on molecular-clock models. In previous exercises we have worked with phylogenies where we did not have information about how fast sequences were evolving, and we therefore used the number of substitutions as branch lengths. When there &#039;&#039;is&#039;&#039; temporal information (e.g., fossils that can be used to date an internal node, or information about sampling-time for rapidly evolving sequences) we can instead use clock-based models. These models assume that sequences are evolving at a more or less constant rate, branch lengths are expressed in terms of time, and we can estimate times for internal nodes. Apart from being useful when the focus is on dating evolutionary events, time trees are also useful in that the clock model itself can lead to better inference of the phylogeny (essentially because it adds prior information to the problem, such that we dont have to infer all branch lengths only from limited amounts of sequence variation).&lt;br /&gt;
&lt;br /&gt;
The main purpose with this exercise is to make you acquainted with BEAST2 and to learn how to fit clock-models using either fossil data (by setting a prior on the date for internal nodes) or using so-called heterochronous data, i.e., sequences where the individual leaves have been sampled at different, known times, and where evolution is sufficiently rapid that we can estimate the parameters in a clock-model by seeing how much change has happened over time.&lt;br /&gt;
&lt;br /&gt;
For these tutorials you only need to report minimally: make a small report with a handful of uncommented screendumps showing your progress through the exercise. The important thing is that you get to be a bit familiar with the use of the program, such that you can use it in the mini project later.&lt;br /&gt;
&lt;br /&gt;
:* In the exercises below, you should simply follow the instructions on the tutorial pages. &lt;br /&gt;
:* In the virtual box you should start programs from the command line, by simply writing the name of the executable. The names of the executables that you will need for this exercise are:&lt;br /&gt;
:** beauti&lt;br /&gt;
:** beast&lt;br /&gt;
:** tracer&lt;br /&gt;
:** treeannotator&lt;br /&gt;
:** figtree&lt;br /&gt;
&lt;br /&gt;
== Introduction to BEAST2 ==&lt;br /&gt;
&lt;br /&gt;
:* Create a new directory for storing the results of this exercise:&lt;br /&gt;
 cd /path/to/molevol&lt;br /&gt;
 mkdir bayes2&lt;br /&gt;
 cd bayes2&lt;br /&gt;
:* Open this link in a new tab: [https://taming-the-beast.org/tutorials/Introduction-to-BEAST2/ Introduction to BEAST2]&lt;br /&gt;
:* Follow instructions down to the optional part.&lt;br /&gt;
:* &#039;&#039;&#039;Note:&#039;&#039;&#039; To get the graphical interface for BEAST2 shown in figure 11 in the tutorial, you should start the program as follows:&lt;br /&gt;
 beast -options&lt;br /&gt;
&lt;br /&gt;
== Prior selection and clock calibration using Influenza A data ==&lt;br /&gt;
&lt;br /&gt;
:* Open this link in a new tab: [https://taming-the-beast.org/tutorials/Prior-selection/ Prior selection and clock calibration using Influenza A data]&lt;br /&gt;
:* &#039;&#039;&#039;NOTE:&#039;&#039;&#039; Only do the part about &#039;&#039;&#039;heterochronous&#039;&#039;&#039; data (not the homochronous part, although you can if you want to)&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=File:Tracer_ugly_caterpillar.png&amp;diff=40</id>
		<title>File:Tracer ugly caterpillar.png</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=File:Tracer_ugly_caterpillar.png&amp;diff=40"/>
		<updated>2024-03-19T13:47:04Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=File:Tracer_hairy_caterpillar.png&amp;diff=39</id>
		<title>File:Tracer hairy caterpillar.png</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=File:Tracer_hairy_caterpillar.png&amp;diff=39"/>
		<updated>2024-03-19T13:46:35Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=File:Tracer_lousy_convergence.png&amp;diff=38</id>
		<title>File:Tracer lousy convergence.png</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=File:Tracer_lousy_convergence.png&amp;diff=38"/>
		<updated>2024-03-19T13:45:56Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=File:Tracer_marginals_overlap.png&amp;diff=37</id>
		<title>File:Tracer marginals overlap.png</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=File:Tracer_marginals_overlap.png&amp;diff=37"/>
		<updated>2024-03-19T13:45:29Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=File:Tracer_fileload2.png&amp;diff=36</id>
		<title>File:Tracer fileload2.png</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=File:Tracer_fileload2.png&amp;diff=36"/>
		<updated>2024-03-19T13:44:58Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Bayesian_phylogenetics:_checking_convergence&amp;diff=35</id>
		<title>Bayesian phylogenetics: checking convergence</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Bayesian_phylogenetics:_checking_convergence&amp;diff=35"/>
		<updated>2024-03-19T13:44:30Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;This exercise is part of the course  Computational Molecular Evolution (22115).  == Check convergence using Tracer ==  In this exercise you will be briefly introduced to how to check if an MCMC run has converged using the program Tracer from the BEAST2 package. You will do this by re-examining the output from the Bayesian analysis you did in the week 9 exercise.  ----  &amp;#039;&amp;#039;&amp;#039;Question 1&amp;#039;&amp;#039;&amp;#039;  : Issue this command to start the Trace...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This exercise is part of the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]].&lt;br /&gt;
&lt;br /&gt;
== Check convergence using Tracer ==&lt;br /&gt;
&lt;br /&gt;
In this exercise you will be briefly introduced to how to check if an MCMC run has converged using the program Tracer from the BEAST2 package. You will do this by re-examining the output from the Bayesian analysis you did in the week 9 exercise.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 1&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
: Issue this command to start the Tracer program:&lt;br /&gt;
 tracer&lt;br /&gt;
&lt;br /&gt;
: Now import the two MCMC sample files from the MrBayes run you did in week 9 for the hcvsmall data set:&lt;br /&gt;
:* File -&amp;gt; Import Trace File (or use the + under the trace file pane)&lt;br /&gt;
:* In the import dialog: find the &amp;quot;bayes&amp;quot; directory and select &amp;quot;All files&amp;quot; under &amp;quot;files of type&amp;quot;. This should give you a list of the output files from the MrBayes run&lt;br /&gt;
:* Select the file &amp;quot;hcvsmall.nexus.run1.p&amp;quot; and open it.&lt;br /&gt;
:* Repeat process for second log file (suffix &amp;quot;.run2.p&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
:::::[[File:Tracer fileload2.png |800px]]&lt;br /&gt;
&lt;br /&gt;
: You can now use Tracer to explore the results of the Bayesian analysis. The first thing you want to check is that the two independent runs have resulted in similar posteriors for the different parameters. This is investigated as follows:&lt;br /&gt;
:* Select both trace files by shift-clicking on their names in the &amp;quot;Trace files&amp;quot; pane (upper left of the Tracer window)&lt;br /&gt;
:* Select the &amp;quot;Marginal Density&amp;quot; tab in the window on the right.&lt;br /&gt;
:* Check different parameters by choosing them in the &amp;quot;Traces&amp;quot; pane on the left (while making sure you still have both trace files selected). This will show the two posteriors for the chosen parameter (see example below). If a run has converged then the two posteriors should mostly be placed right on top of each other. &lt;br /&gt;
:* Note that Tracer by default uses a burnin of 10% of the total number of generations. You can change that by double-clicking in the Burn-in field of the trace file pane (you need to change it separately for each file). Typically we would use a burn-in of 25% or 50%.&lt;br /&gt;
&lt;br /&gt;
:::::[[File:Tracer marginals overlap.png|800px]]&lt;br /&gt;
&lt;br /&gt;
:* The plot below shows an example where convergence has not occurred yet:&lt;br /&gt;
&lt;br /&gt;
:::::[[File:Tracer lousy convergence.png|800px]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Take screen dumps of the marginal posterior plots for the following parameters and include them in your report: m{1} and piA{all}&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 2&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
:* Another thing to check is how the trace looks as a function of the iteration number: Optimally you would want a trace that looks like a &amp;quot;hairy caterpillar&amp;quot;, with random jumps up and down on a mostly constant level (see example below). &lt;br /&gt;
:* Select the &amp;quot;Trace&amp;quot; tab in the window on the right to see trace plots (still with one or both trace files selected in the Trace File pane).&lt;br /&gt;
:* Related to this: The ESS column gives the &amp;quot;Effective Sample Size&amp;quot; for each parameter. As a rule of thumb we want this to be at least 200 (and Tracer flags smaller values by colouring the ESS values). &lt;br /&gt;
:** Briefly, the problem here is that consecutive samples from MCMC are correlated (they are not independent). This is due to the use of a Markov chain for sampling: the new position in parameter space depends on the previous location (and the proposal distribution). &lt;br /&gt;
:** The degree of non-indepence can be quantified by the auto-correlation for different lags: The autocorrelation for lag k is found by computing the Pearson correlation between all samples, and the samples k generations later. &lt;br /&gt;
:** Based on computation of auto-correlation at different lags (&amp;lt;math&amp;gt;k = [1, 2, 3, ...]&amp;lt;/math&amp;gt;) Tracer determines the Auto-Correlation Time (ACT), which is the number of generations in the MCMC chain that two samples have to be separated by for them to be uncorrelated. The ACT for a parameter can be seen in the Estimates tab in Tracer.&lt;br /&gt;
:** Tracer also estimates the Effective Sample Size (ESS), which is the number of independent samples that the trace is equivalent to. This is essentially the chain length (excluding the burn-in) divided by the ACT.&lt;br /&gt;
:* Note how the highlighted parameter corresponding to the hairy caterpillar trace also has a high ESS in the example below.&lt;br /&gt;
&lt;br /&gt;
:::::[[File:Tracer hairy caterpillar.png|800px]]&lt;br /&gt;
&lt;br /&gt;
:* Trace plots where there are clearly visible dips and rises (see example below) indicates that there is auto correlation among the samples we have included - the samples are not independent of each other (and therefore provide less information about the posterior). This is referred to as &amp;quot;poor mixing&amp;quot;. One solution to such a problem is to increase the number of iterations (and perhaps write samples less frequently). It might also be an indication that the model fits poorly, and that you could get a better convergence by changing the substitution model, or setting more informative priors.&lt;br /&gt;
:* Note how the poorly mixing parameter in the example below also has a low ESS.&lt;br /&gt;
&lt;br /&gt;
:::::[[File:Tracer ugly caterpillar.png|800px]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Take screen dumps of the trace plots for the following parameters and include them in your report: m{1} and piA{all}. What is the ESS for these parameters?&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Model_selection&amp;diff=34</id>
		<title>Model selection</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Model_selection&amp;diff=34"/>
		<updated>2024-03-19T13:43:35Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Analysis of viral data set: alignment of coding DNA */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This exercise is part of the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
: In this exercise you are going to investigate features of HIV-1 evolution. You will do this by analyzing a large set of env-genes from HIV-1, subtype B. specifically, the DNA sequences analyzed here correspond to a region surrounding the hypervariable V3 region of the gp120 protein.&lt;br /&gt;
&lt;br /&gt;
: Like other retroviruses, particles of HIV are made up of 2 copies of a single-stranded RNA genome packaged inside a protein core, or capsid. The core particle also contains viral proteins that are essential for the early steps of the virus life cycle, such as reverse transcription and integration. A lipid envelope, derived from the infected cell, surrounds the core particle. Embedded in this envelope are the surface glycoproteins of HIV: gp120 and gp41. The gp120 protein is crucial for binding of the virus particle to target cells, while gp41 is important for the subsequent fusion event. It is the specific affinity of gp120 for the CD4 protein that targets HIV to those cells of the immune system that express CD4 on their surface (e.g., T-helper lymphocytes, monocytes, and macrophages).&lt;br /&gt;
&lt;br /&gt;
: The role gp120 plays in infection and the fact that it is situated on the surface of the HIV particle, means it is an obvious target for the immune response. That means that there may be a considerable selective pressure on gp120 for creating immune-escape mutants, where amino acids in the gp120 epitopes have been substituted. In this exercise you will construct a maximum likelihood tree that we will subsequently use to investigate whether you can detect such a selective pressure on parts of gp120, again using maximum likelihood methods.&lt;br /&gt;
&lt;br /&gt;
: One major goal with the exercise is to introduce you to statistically based methods for assessing the strength of evidence for a set of alternative hypotheses about some biological system of interest. The model selection method we will use is AIC (Akaike Information Criterion), based on which you will compute model probabilities. A second goal is to make you aware that phylogenetic analysis is not only about constructing trees, but that it is also a useful framework for analyzing biological questions more generally.&lt;br /&gt;
&lt;br /&gt;
: Specifically, you will&lt;br /&gt;
&lt;br /&gt;
:# perform a multiple alignment of gp120 DNA sequences taking protein-level information into account (using revtrans).&lt;br /&gt;
:# select a suitable nucleotide substitution model (using jmodeltest2)&lt;br /&gt;
:# construct a phylogenetic tree (using PAUP).&lt;br /&gt;
:# try to detect positively selected sites in gp120 (using PAML).&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Recipe for computing AIC values and model probabilities ==&lt;br /&gt;
&lt;br /&gt;
: Later in today&#039;s exercise you will be asked to compute AIC values and model probabilities. Return to this section and follow the instructions when you need to do so.&lt;br /&gt;
&lt;br /&gt;
:# Fit a set of models to your data, note the maximized log likelihoods (lnL) and the number of free parameters (K) for each model in the investigated set. The models you fit should represent a plausible and comprehensive set of hypotheses about your data.&lt;br /&gt;
:# Compute AIC for each of the models: &#039;&#039;&#039;AIC = -2 x lnL + 2K&#039;&#039;&#039;.  &amp;lt;br&amp;gt;For example: a model with lnL = -2010 and K = 5 will have AIC = -2 x -2010 + 2 x 5 = 4030.&lt;br /&gt;
:# Identify the model with the smallest AIC (this is the best model in the set). We will call the AIC for this model &#039;&#039;&#039;&amp;quot;AICmin&amp;quot;&#039;&#039;&#039;.&lt;br /&gt;
:# Compute the &amp;quot;ΔAIC&amp;quot; values for each model: &#039;&#039;&#039;ΔAIC = AIC - AICmin&#039;&#039;&#039; &amp;lt;br&amp;gt;For each model subtract the minimum AIC value. The best model will have a ΔAIC of zero. The rest of the models will have positive ΔAICs.&lt;br /&gt;
:# For each model compute the following quantity: &#039;&#039;&#039;numerator = exp(-0.5 x ΔAIC)&#039;&#039;&#039; &amp;lt;br&amp;gt;For example, a model with ΔAIC=4.2 will have numerator = exp(-0.5 x 4.2) = exp(-2.1) = 0.1225. Also compute the &#039;&#039;&#039;sum&#039;&#039;&#039; of the numerator values for all models.&lt;br /&gt;
:# Finally, the model probabilities for each model are found as: &#039;&#039;&#039;P(model) = numerator / sum&#039;&#039;&#039; &amp;lt;br&amp;gt;For example, if sum = 3.75 and a model has numerator = 1.3, then it has P(model) = 1.3 / 3.75 = 0.35&lt;br /&gt;
&lt;br /&gt;
: You may want to keep track of the computations by constructing a table along the following lines:&lt;br /&gt;
[[File:Molevol-Downloads-aictable.png|700px]]&lt;br /&gt;
&lt;br /&gt;
: Note that model probabilities can also be computed using Bayesian methods. One advantage of Bayesian methods over AIC is that instead of relying on a point estimate, uncertainty about parameter values is accounted for by integrating over all possible values (typically using MCMC).&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Create working directory, copy files&#039;&#039;&#039;&lt;br /&gt;
: In the command below: Instead of /path/to/molevol enter the path to the directory where you have placed your course files (for instance cd /Users/bob/Documents/molevol, or cd /home/student/molevol).&lt;br /&gt;
 cd /path/to/molevol&lt;br /&gt;
 mkdir modelselect&lt;br /&gt;
 cd modelselect&lt;br /&gt;
 cp ../data/gp120.fasta ./gp120.fasta&lt;br /&gt;
 cp ../data/codeml.ctl ./codeml.ctl&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Have a look at the DNA data file:&#039;&#039;&#039;&lt;br /&gt;
 nedit gp120.fasta &amp;amp;&lt;br /&gt;
: The file contains several DNA sequences from HIV-1, subtype B. The sequences are approximately 500 bp long, and correspond to a region surrounding the hypervariable V3 region in the gene encoding gp120. Close the nedit window when you&#039;ve had a look.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Analysis of viral data set: alignment of coding DNA ==&lt;br /&gt;
&lt;br /&gt;
: DNA sequences are a lot less informative than protein sequences and for this reason it is always preferable to align coding DNA in translated form. The simple fact that proteins are built from 20 amino acids while DNA only contains four different bases, means that the &#039;signal-to-noise ratio&#039; in protein sequence alignments is much better than in alignments of DNA. Besides this information-theoretical advantage, protein alignments also benefit from the information that is implicit in empirical substitution matrices such as BLOSUM-62. Taken together with the generally higher rate of synonymous substitutions over non-synonymous ones, this means that the phylogenetic signal disappears much more rapidly from DNA sequences than from the encoded proteins. It is therefore preferable to align coding DNA at the amino acid level.&lt;br /&gt;
&lt;br /&gt;
: However, in the context of molecular evolution, DNA alignments retain a lot of useful information regarding silent mutations. Especially the ratio between silent and non-silent substitutions is informative. We would therefore like to construct a multiple alignment at the DNA level, but using information at the protein level, and the RevTrans server does exactly that.&lt;br /&gt;
&lt;br /&gt;
: &#039;&#039;&#039;RevTrans&#039;&#039;&#039; takes as input an unaligned set of DNA sequences, automatically translates them to the equivalent amino acid sequences, constructs a multiple alignment of the protein sequences, and finally uses the protein alignment as a template for constructing a multiple DNA alignment that is in accordance with the protein alignment. This also means that gaps are always inserted in groups of three so reading frames are kept in order. That is important if you want to analyze selection, as we will in this exercise.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Construct RevTrans alignment&#039;&#039;&#039;&lt;br /&gt;
:* Open RevTrans server page: https://services.healthtech.dtu.dk/services/RevTrans-2.0/&lt;br /&gt;
:* On the RevTrans page: Choose the file gp120.fasta as input (or copy and paste the sequence into the sequence window)&lt;br /&gt;
:* Click the &amp;quot;Submit query&amp;quot; button&lt;br /&gt;
:* When the alignment is done you may have to click link named &amp;quot;here&amp;quot; to go to results page&lt;br /&gt;
:* Download DNA alignment, by right-clicking the link for &amp;quot;Download alignment in FASTA format&amp;quot;, and choosing &amp;quot;Save link as...&amp;quot; (save file under the name gp120align.fasta and make sure to save the file in the directory modelselect).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Convert alignment to NEXUS format&#039;&#039;&#039;&lt;br /&gt;
: Convert the fasta file to NEXUS format and save file in the modelselect directory under the name gp120.nexus&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Selection of substitution model using jmodeltest2 ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 1&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
: As part of the present analysis we are going to build a phylogenetic tree based on the DNA alignment constructed above. We will construct the tree using maximum likelihood, but to do that we first have to decide which substitution model we want to use. Specifically, we are interested in using the model that best describes our data without having more parameters than strictly necessary (thus avoiding overfitting). We will investigate this issue by fitting a set of 56 different models to our data and then selecting one with a reasonable balance between model complexity and data fit.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start jmodeltest2&#039;&#039;&#039;&lt;br /&gt;
 jmodeltest&lt;br /&gt;
&#039;&#039;&#039;Load data set&#039;&#039;&#039;&lt;br /&gt;
:* File -&amp;gt; Load DNA alignment -&amp;gt; File Format -&amp;gt; Select &amp;quot;All files&amp;quot;&lt;br /&gt;
:* Navigate to gp120.nexus and load it&lt;br /&gt;
&#039;&#039;&#039;Fit 56 models&#039;&#039;&#039;&lt;br /&gt;
:* Analysis -&amp;gt; Compute likelihood scores&lt;br /&gt;
:* Select &amp;quot;7&amp;quot; under &amp;quot;Number of substitution schemes&amp;quot;&lt;br /&gt;
:* Select &amp;quot;Fixed BIONJ-JC&amp;quot; under &amp;quot;Base tree for likelihood calculations&amp;quot;&lt;br /&gt;
:* Click &amp;quot;Compute likelihoods&amp;quot;&lt;br /&gt;
&lt;br /&gt;
: This causes jmodeltest2 to perform the following actions: first a neighbor joining tree is constructed using the Jukes and Cantor model. Then the tree is fixed and used as the basis for fitting a set of 56 different models to the data. For each model, the estimated model parameters and the negative log-likelihood are recorded. In addition to varying sets of substitution rate parameters (JC, K2P, ...), some of these models also include extra parameters that take into account the presence of different rates between sites. This is done in two ways: (1) by fitting a gamma distribution of rates (&amp;quot;+G&amp;quot;), and (2) by allowing for a proportion of constant (&amp;quot;invariable&amp;quot;) sites (&amp;quot;+I&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
: Wait until jmodeltest2 is done fitting all 56 models (this will take a little while depending on your computer).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect result, manually check model probabilities for three models&#039;&#039;&#039;&lt;br /&gt;
:* Results -&amp;gt; Show results table&lt;br /&gt;
: For each model this table lists the negative log-likelihood (&amp;quot;-lnL&amp;quot;), the number of parameters (&amp;quot;p&amp;quot;), and estimates of all model parameters (excluding branch lengths).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Manually compute model probabilities for three substitution models&#039;&#039;&#039;&lt;br /&gt;
: Use AIC-based model probabilities to investigate which of the following three substitution models are best at describing how the sequences have evolved:&lt;br /&gt;
:* Jukes and Cantor with fraction of invariant sites (JC+I)&lt;br /&gt;
:* Jukes and Cantor with gamma-distributed rates over sites (JC+G)&lt;br /&gt;
:* Jukes and Cantor with invariant sites and gamma-distributed rates (JC+I+G)&lt;br /&gt;
: Before you can do the computation you need to know the log likelihood and the number of parameters for each model. Locate these values in the table for the JC+I, JC+G, and JC+I+G models, and write them down. Close the window with the result table when you are done.&lt;br /&gt;
&lt;br /&gt;
:Make sure to get the signs right: the values reported in the table are -lnL values, so you will need to reverse the sign to get the lnL (the lnL values you write down should be negative). &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Use the recipe above to compute AIC values and model probabilities. Report the results in a table similar to the one shown above&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 2&#039;&#039;&#039; Based on the model probabilities: wich model has more support?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 3&#039;&#039;&#039;&lt;br /&gt;
&#039;&#039;&#039;Use modeltest program to select best model&#039;&#039;&#039;&lt;br /&gt;
: What you just did manually for JC+I, JC+G and JC+I+G, jmodeltest2 can do automatically for the full set of 56 fitted models. Specifically, it uses the list of negative log likelihoods and parameter counts in the table to compute AIC and model probabilities, and uses this to select the model that best fits the sequence data:&lt;br /&gt;
:* Analysis -&amp;gt; Do AIC calculations -&amp;gt; &lt;br /&gt;
:* Select &amp;quot;Write PAUP* block&amp;quot;&lt;br /&gt;
:* click &amp;quot;Do AIC calculations&amp;quot;&lt;br /&gt;
:* Results -&amp;gt; Show results table&lt;br /&gt;
:* Select &amp;quot;AIC&amp;quot; tab&lt;br /&gt;
:* SHIFT+click on the header of the &amp;quot;weight&amp;quot; column. This sorts the rows according to model weight, in descending order.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What model was selected by modeltest based on the AIC values?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Construction of phylogenetic tree using PAUP ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 4&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
: Close the results table. In the main window you should now scroll up to the lines giving PAUP commands that will implement the selected model. The command is enclosed between &amp;quot;BEGIN PAUP&amp;quot; and &amp;quot;END;&amp;quot; and should look something like this:&lt;br /&gt;
 Lset Base=(0.4064, [...]&lt;br /&gt;
: You will need to copy this command to a PAUP session in the next step.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start PAUP&#039;&#039;&#039;&lt;br /&gt;
 paup&lt;br /&gt;
: Above you used jmodeltest2 to select the most suitable substitution model for the present data set. You will now use this model to construct a maximum likelihood tree. You will use PAUP for this purpose. (note: it is possible to create a maximum likelihood or a model-averaged tree directly from the jmodeltest2 program, but we will instead do it in PAUP in order to more clearly see each step that is taken).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Load alignment:&#039;&#039;&#039;&lt;br /&gt;
 execute gp120.nexus&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Set tree-building criterion to maximum likelihood&#039;&#039;&#039;&lt;br /&gt;
 set criterion=likelihood&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Set model parameters to winning estimates&#039;&#039;&#039;&lt;br /&gt;
: Above you located a set of lines in the jmodeltest2 output giving a PAUP command that sets the model parameters to the estimates that were found using the winning model. Copy and paste this lset command (without the BEGIN and END parts) into the window where PAUP is running.&lt;br /&gt;
&lt;br /&gt;
 PASTE LSET COMMAND FROM MODELTEST RUN HERE&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find best tree using selected model&#039;&#039;&#039;&lt;br /&gt;
: Still in the PAUP-window, enter the following command&lt;br /&gt;
 hsearch swap=tbr start=nj&lt;br /&gt;
: This command causes PAUP to perform a heuristic search for the best maximum likelihood tree. Once an initial tree has been constructed, the heuristic search proceeds by rearrangements of the &amp;quot;tree bisection and reconnection&amp;quot; type (TBR). We are using the model selected by modeltest, AND the parameter estimates found by modeltest on that model. You could also have chosen to simply estimate all the model parameters as part of this step (i.e., at the same time as finding the best tree), but fixing them improves speed tremendously. Findind the best tree should take a few minutes.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Save best tree to file&#039;&#039;&#039;&lt;br /&gt;
 savetrees format=newick brlens=yes file=gp120tree.phy from=1 to=1&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Quit program&#039;&#039;&#039;&lt;br /&gt;
 quit&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Have a look at the tree:&#039;&#039;&#039;&lt;br /&gt;
: You have now produced an unrooted tree of the HIV sequences and saved it in the file gp120tree.phy. Note that in this exercise we will not be interested in the tree as such - our focus is instead on finding positive selection on a subset of codon positions and the tree is just something we need in order to be able to fit the different codon models to the data. If you want to see the tree, you can do so with the following command:&lt;br /&gt;
 figtree gp120tree.phy &amp;amp;&lt;br /&gt;
: There is no meaningful root placed in this tree, so you may want to choose the unrooted view (the third icon in the Layout section of the figtree window). Close the figtree window when you have had a look&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What is the negative log likelihood of the tree you just found?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Detection of positively selected sites in gp120 ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 5&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
: There is much more to phylogenetic analyses than merely reconstructing trees. One interesting result of probabilistic methods, is that the parameters of a model will have their values determined as part of the optimization procedure. This means that once such a model has been fitted to the data, it is possible to investigate these estimated parameter values to learn features about the evolutionary history of the sequences under investigation. In the present example we will focus on investigating whether we can find positively selected sites in our data set, defined as sites where the dN/dS ratio is larger than 1. We do that by using a codon substitution model where the dN/dS ratio is one of its parameters.&lt;br /&gt;
&lt;br /&gt;
: A further strength of the probabilistic approach is that you get a probabilistic measure of how well any model fits the data. This means you can use a stringent approach to determine which model fits the data best. In this framework one uses likelihoods (probabilities of data given model) to determine which model fits the data best. As you saw above, it is for instance possible to compute AIC values and model probabilities from the likelihood values of fitted models, Since each model essentially corresponds to a hypothesis about the evolutionary history of the data, we can thus use a stringent statistical approach to decide which hypothesis best describes our data.&lt;br /&gt;
&lt;br /&gt;
: In outline, you will now use the following steps to investigate whether there is any evidence for positively selected codons in your data set:&lt;br /&gt;
&lt;br /&gt;
:* Fit model M1, which assumes there are two classes of codons in the sequence: some with dN/dS &amp;lt; 1, some with dN/dS=1.&lt;br /&gt;
:* Fit model M2, which assumes 3 distinct classes of codons: two with dN/dS ratios as for M1, and one extra class with dN/dS &amp;gt; 1.&lt;br /&gt;
:* Assess the strength of evidence for the two models using AIC-based model probabilities&lt;br /&gt;
:* If M2 is better: identify the positively selected codons&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect the parameter file&#039;&#039;&#039;&lt;br /&gt;
 nedit codeml.ctl &amp;amp;&lt;br /&gt;
: The file &amp;quot;codeml.ctl&amp;quot; contains several settings that are relevant for running the program &#039;&#039;&#039;codeml&#039;&#039;&#039;. Find the following lines and ensure that the file contains these values:&lt;br /&gt;
 &#039;&#039;&#039;seqfile =  gp120align.fasta&#039;&#039;&#039;:  name of alignment file&lt;br /&gt;
 &#039;&#039;&#039;treefile =  gp120tree.phy&#039;&#039;&#039;: name of tree file&lt;br /&gt;
 &#039;&#039;&#039;seqtype = 1&#039;&#039;&#039;: tells the program that our data consists of coding DNA.&lt;br /&gt;
 &#039;&#039;&#039;NSsites = 1 2&#039;&#039;&#039; : tells the program to analyze models M1 and M2.&lt;br /&gt;
 &#039;&#039;&#039;cleandata = 1&#039;&#039;&#039;: tells the program to ignore positions with gaps.&lt;br /&gt;
&lt;br /&gt;
: The settings entered by us will cause codeml to analyze two hypotheses about dN/dS ratios. M1 says there are two classes of codons with different dN/dS ratios in the sequence: one class with dN/dS &amp;lt; 1 (codons under purifying or negative selection), and one class with dN/dS=1 (no selection - neutrally evolving sites). M2 says there are 3 distinct dN/dS ratios for different sites in the sequence: one class with dN/dS &amp;lt; 1, one class with dN/dS=1 (these are the same type of classes as for M1), and one class with dN/dS &amp;gt; 1 (corresponding to sites under positive selection). The value of the dN/dS ratios (for those classes that have dN/dS &amp;lt; 1 or dN/dS &amp;gt; 1), the fraction of sites belonging to each class, and the position of sites belonging to each class, are unknown at first and will be determined during the analysis.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start the analysis&#039;&#039;&#039;&lt;br /&gt;
 codeml&lt;br /&gt;
: This will start the codeml program using the settings in the file codeml.ctl. Depending on your computer, this will take some minutes to finish. (You may be able to see how the optimization procedure results in progressively better fits: the likelihood increases, meaning that negative log-likelihood decreases, as the fit improves).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect result file:&#039;&#039;&#039;&lt;br /&gt;
: Wait for the run to finish, and then look at the result file:&lt;br /&gt;
 nedit selection.results &amp;amp;&lt;br /&gt;
: This file contains a wealth of information concerning your analysis. The top part of the file gives an overview of your sequences, codon usage and nucleotide frequencies. You can ignore this information for now, and move on to the interesting part, namely the model likelihoods and parameter values:&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find likelihood, and number of free parameters for model M1&#039;&#039;&#039;&lt;br /&gt;
 Search ==&amp;gt; Find... ==&amp;gt; enter &amp;quot;Model 1&amp;quot; and click Find&lt;br /&gt;
: You are now in the region of the result file where the model likelihoods and parameter estimates are noted. Now, locate a line that looks a bit like the one shown below:&lt;br /&gt;
 lnL(ntime: 72  np: 74):  -4242.470345     +0.000000&lt;br /&gt;
: Identify the number of &amp;quot;free parameters&amp;quot;, K, used in model M1: This is indicated by &amp;quot;np&amp;quot;, and is 74 in the example shown above (most of these parameters are branch lengths in the tree; specifically, the number of branch length parameters is indicated by &amp;quot;ntime&amp;quot;, and is 72 in this example). Also note the log-likelihood of the fitted model. This is the number right after the parenthesis, and is -4242.470345 in the example here.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What are the values of K and lnL for model M1?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 6&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find dn/dS ratios and codon class proportions for model M1:&#039;&#039;&#039;&lt;br /&gt;
: Scroll down a few lines until you get to a small table similar to this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
      dN/dS for site classes (K=2)&lt;br /&gt;
      p:   0.75111  0.24889&lt;br /&gt;
      w:   0.06583  1.00000&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
: This gives a summary of the dN/dS ratios that were found in the data set. The line starting w: lists the two dN/dS ratios that were found (in this case 0.06583 and 1.00000 - the last one was pre-specified by us as part of the model and was therefore not a free parameter). The line starting p: gives the proportion of codon sites belonging to each of the dN/dS ratio classes (in the example above approximately 75% belong to the first class , while 25% of all sites belong to the class having dN/dS=1.00000).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What are the dN/dS value (w) and proportion (p) of sites for both classes. Report the following values: p(class1), w(class1), p(class2), w(class2)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 7&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find likelihood, and K for model M2&#039;&#039;&#039;&lt;br /&gt;
: Scroll past the M1 output until you get to the results for model M2.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What are the values of K and lnL for model M2?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 8&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find dn/dS ratios and codon class proportions for model M2:&#039;&#039;&#039;&lt;br /&gt;
: Now, scroll down a few lines until you get to a small table similar to the one you examined for M1 before. For this model there are 3 separate classes of codons.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What are the dN/dS value (w) and proportion (p) of sites for all three classes? Report these values: p(class1), w(class1), p(class2), w(class2), p(class3), w(class3)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 9&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Assess strength of evidence for models M1 and M2:&#039;&#039;&#039;&lt;br /&gt;
: M2 will always have a better (higher) log-likelihood than model M1 because M2 has more free parameters, and M1 is nested within M2. You should now use the recipe given above to compute AIC values and model probabilities for M1 and M2.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Report: AIC, ΔAIC, w (model probability) for M1 and M2&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 10: &#039;&#039;&#039; Is M2 better than M1?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 11&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Examine list of positively selected sites&#039;&#039;&#039;&lt;br /&gt;
: If your M2 is clearly better than M1 (I firmly believe it should be if you did things according to instructions...), then you have evidence for the existence of positively selected sites in the gp120 gene. Now, scroll down to the end of the result file and locate a list similar to the one below. Note: This is the &amp;quot;Bayes Empirical Bayes&amp;quot; table, not the &amp;quot;Naive Empirical Bayes&amp;quot; table. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Bayes Empirical Bayes (BEB) analysis&lt;br /&gt;
Positively selected sites&lt;br /&gt;
&lt;br /&gt;
         Prob(w&amp;gt;1)     mean w&lt;br /&gt;
&lt;br /&gt;
    25 A 0.959*        3.133 +- 0.769&lt;br /&gt;
    27 P 0.906         2.990 +- 0.877&lt;br /&gt;
    56 K 0.987*        3.197 +- 0.687&lt;br /&gt;
    59 V 0.915         3.032 +- 0.873&lt;br /&gt;
    78 R 0.637         2.351 +- 1.129&lt;br /&gt;
    88 K 0.573         2.148 +- 1.077&lt;br /&gt;
    95 V 0.925         3.046 +- 0.843&lt;br /&gt;
    ...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
: It is not important what the distinction is in this context, but very briefly NEB ignores the fact that there is uncertainty about  maximum likelihood estimates, especially for smaller data sets (for instance w for some codon is perhaps not exactly 3.046, but could be in a region around that value), while [https://pubmed.ncbi.nlm.nih.gov/15689528/ BEB accounts for that uncertainty].&lt;br /&gt;
: This gives you a list of which residues (if any) that were found to belong to the positively selected dN/dS-class. Also listed is the probability that the site really is in the codon class where dN/dS &amp;gt; 1, and a weighted average of the w at the site. Using only DNA sequences you have now identified likely epitopes on the gp120 protein.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;List all sites having more than 95% probability of belonging to the positively selected class&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=File:Molevol-Downloads-aictable.png&amp;diff=33</id>
		<title>File:Molevol-Downloads-aictable.png</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=File:Molevol-Downloads-aictable.png&amp;diff=33"/>
		<updated>2024-03-19T13:43:09Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Model_selection&amp;diff=32</id>
		<title>Model selection</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Model_selection&amp;diff=32"/>
		<updated>2024-03-19T13:42:35Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;This exercise is part of the course  Computational Molecular Evolution (22115).   == Overview ==  : In this exercise you are going to investigate features of HIV-1 evolution. You will do this by analyzing a large set of env-genes from HIV-1, subtype B. specifically, the DNA sequences analyzed here correspond to a region surrounding the hypervariable V3 region of the gp120 protein.  : Like other retroviruses, particles of HIV...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This exercise is part of the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
: In this exercise you are going to investigate features of HIV-1 evolution. You will do this by analyzing a large set of env-genes from HIV-1, subtype B. specifically, the DNA sequences analyzed here correspond to a region surrounding the hypervariable V3 region of the gp120 protein.&lt;br /&gt;
&lt;br /&gt;
: Like other retroviruses, particles of HIV are made up of 2 copies of a single-stranded RNA genome packaged inside a protein core, or capsid. The core particle also contains viral proteins that are essential for the early steps of the virus life cycle, such as reverse transcription and integration. A lipid envelope, derived from the infected cell, surrounds the core particle. Embedded in this envelope are the surface glycoproteins of HIV: gp120 and gp41. The gp120 protein is crucial for binding of the virus particle to target cells, while gp41 is important for the subsequent fusion event. It is the specific affinity of gp120 for the CD4 protein that targets HIV to those cells of the immune system that express CD4 on their surface (e.g., T-helper lymphocytes, monocytes, and macrophages).&lt;br /&gt;
&lt;br /&gt;
: The role gp120 plays in infection and the fact that it is situated on the surface of the HIV particle, means it is an obvious target for the immune response. That means that there may be a considerable selective pressure on gp120 for creating immune-escape mutants, where amino acids in the gp120 epitopes have been substituted. In this exercise you will construct a maximum likelihood tree that we will subsequently use to investigate whether you can detect such a selective pressure on parts of gp120, again using maximum likelihood methods.&lt;br /&gt;
&lt;br /&gt;
: One major goal with the exercise is to introduce you to statistically based methods for assessing the strength of evidence for a set of alternative hypotheses about some biological system of interest. The model selection method we will use is AIC (Akaike Information Criterion), based on which you will compute model probabilities. A second goal is to make you aware that phylogenetic analysis is not only about constructing trees, but that it is also a useful framework for analyzing biological questions more generally.&lt;br /&gt;
&lt;br /&gt;
: Specifically, you will&lt;br /&gt;
&lt;br /&gt;
:# perform a multiple alignment of gp120 DNA sequences taking protein-level information into account (using revtrans).&lt;br /&gt;
:# select a suitable nucleotide substitution model (using jmodeltest2)&lt;br /&gt;
:# construct a phylogenetic tree (using PAUP).&lt;br /&gt;
:# try to detect positively selected sites in gp120 (using PAML).&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Recipe for computing AIC values and model probabilities ==&lt;br /&gt;
&lt;br /&gt;
: Later in today&#039;s exercise you will be asked to compute AIC values and model probabilities. Return to this section and follow the instructions when you need to do so.&lt;br /&gt;
&lt;br /&gt;
:# Fit a set of models to your data, note the maximized log likelihoods (lnL) and the number of free parameters (K) for each model in the investigated set. The models you fit should represent a plausible and comprehensive set of hypotheses about your data.&lt;br /&gt;
:# Compute AIC for each of the models: &#039;&#039;&#039;AIC = -2 x lnL + 2K&#039;&#039;&#039;.  &amp;lt;br&amp;gt;For example: a model with lnL = -2010 and K = 5 will have AIC = -2 x -2010 + 2 x 5 = 4030.&lt;br /&gt;
:# Identify the model with the smallest AIC (this is the best model in the set). We will call the AIC for this model &#039;&#039;&#039;&amp;quot;AICmin&amp;quot;&#039;&#039;&#039;.&lt;br /&gt;
:# Compute the &amp;quot;ΔAIC&amp;quot; values for each model: &#039;&#039;&#039;ΔAIC = AIC - AICmin&#039;&#039;&#039; &amp;lt;br&amp;gt;For each model subtract the minimum AIC value. The best model will have a ΔAIC of zero. The rest of the models will have positive ΔAICs.&lt;br /&gt;
:# For each model compute the following quantity: &#039;&#039;&#039;numerator = exp(-0.5 x ΔAIC)&#039;&#039;&#039; &amp;lt;br&amp;gt;For example, a model with ΔAIC=4.2 will have numerator = exp(-0.5 x 4.2) = exp(-2.1) = 0.1225. Also compute the &#039;&#039;&#039;sum&#039;&#039;&#039; of the numerator values for all models.&lt;br /&gt;
:# Finally, the model probabilities for each model are found as: &#039;&#039;&#039;P(model) = numerator / sum&#039;&#039;&#039; &amp;lt;br&amp;gt;For example, if sum = 3.75 and a model has numerator = 1.3, then it has P(model) = 1.3 / 3.75 = 0.35&lt;br /&gt;
&lt;br /&gt;
: You may want to keep track of the computations by constructing a table along the following lines:&lt;br /&gt;
[[File:Molevol-Downloads-aictable.png|700px]]&lt;br /&gt;
&lt;br /&gt;
: Note that model probabilities can also be computed using Bayesian methods. One advantage of Bayesian methods over AIC is that instead of relying on a point estimate, uncertainty about parameter values is accounted for by integrating over all possible values (typically using MCMC).&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Create working directory, copy files&#039;&#039;&#039;&lt;br /&gt;
: In the command below: Instead of /path/to/molevol enter the path to the directory where you have placed your course files (for instance cd /Users/bob/Documents/molevol, or cd /home/student/molevol).&lt;br /&gt;
 cd /path/to/molevol&lt;br /&gt;
 mkdir modelselect&lt;br /&gt;
 cd modelselect&lt;br /&gt;
 cp ../data/gp120.fasta ./gp120.fasta&lt;br /&gt;
 cp ../data/codeml.ctl ./codeml.ctl&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Have a look at the DNA data file:&#039;&#039;&#039;&lt;br /&gt;
 nedit gp120.fasta &amp;amp;&lt;br /&gt;
: The file contains several DNA sequences from HIV-1, subtype B. The sequences are approximately 500 bp long, and correspond to a region surrounding the hypervariable V3 region in the gene encoding gp120. Close the nedit window when you&#039;ve had a look.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Analysis of viral data set: alignment of coding DNA ==&lt;br /&gt;
&lt;br /&gt;
: DNA sequences are a lot less informative than protein sequences and for this reason it is always preferable to align coding DNA in translated form. The simple fact that proteins are built from 20 amino acids while DNA only contains four different bases, means that the &#039;signal-to-noise ratio&#039; in protein sequence alignments is much better than in alignments of DNA. Besides this information-theoretical advantage, protein alignments also benefit from the information that is implicit in empirical substitution matrices such as BLOSUM-62. Taken together with the generally higher rate of synonymous substitutions over non-synonymous ones, this means that the phylogenetic signal disappears much more rapidly from DNA sequences than from the encoded proteins. It is therefore preferable to align coding DNA at the amino acid level.&lt;br /&gt;
&lt;br /&gt;
: However, in the context of molecular evolution, DNA alignments retain a lot of useful information regarding silent mutations. Especially the ratio between silent and non-silent substitutions is informative. We would therefore like to construct a multiple alignment at the DNA level, but using information at the protein level, and the RevTrans server does exactly that.&lt;br /&gt;
&lt;br /&gt;
: &#039;&#039;&#039;RevTrans&#039;&#039;&#039; takes as input an unaligned set of DNA sequences, automatically translates them to the equivalent amino acid sequences, constructs a multiple alignment of the protein sequences, and finally uses the protein alignment as a template for constructing a multiple DNA alignment that is in accordance with the protein alignment. This also means that gaps are always inserted in groups of three so reading frames are kept in order. That is important if you want to analyze selection, as we will in this exercise.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Construct RevTrans alignment&#039;&#039;&#039;&lt;br /&gt;
:* Open RevTrans server page: https://services.healthtech.dtu.dk/service.php?RevTrans-2.0/&lt;br /&gt;
:* On the RevTrans page: Choose the file gp120.fasta as input (or copy and paste the sequence into the sequence window)&lt;br /&gt;
:* Click the &amp;quot;Submit query&amp;quot; button&lt;br /&gt;
:* When the alignment is done you may have to click link named &amp;quot;here&amp;quot; to go to results page&lt;br /&gt;
:* Download DNA alignment, by right-clicking the link for &amp;quot;Download alignment in FASTA format&amp;quot;, and choosing &amp;quot;Save link as...&amp;quot; (save file under the name gp120align.fasta and make sure to save the file in the directory modelselect).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Convert alignment to NEXUS format&#039;&#039;&#039;&lt;br /&gt;
: Convert the fasta file to NEXUS format and save file in the modelselect directory under the name gp120.nexus&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Selection of substitution model using jmodeltest2 ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 1&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
: As part of the present analysis we are going to build a phylogenetic tree based on the DNA alignment constructed above. We will construct the tree using maximum likelihood, but to do that we first have to decide which substitution model we want to use. Specifically, we are interested in using the model that best describes our data without having more parameters than strictly necessary (thus avoiding overfitting). We will investigate this issue by fitting a set of 56 different models to our data and then selecting one with a reasonable balance between model complexity and data fit.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start jmodeltest2&#039;&#039;&#039;&lt;br /&gt;
 jmodeltest&lt;br /&gt;
&#039;&#039;&#039;Load data set&#039;&#039;&#039;&lt;br /&gt;
:* File -&amp;gt; Load DNA alignment -&amp;gt; File Format -&amp;gt; Select &amp;quot;All files&amp;quot;&lt;br /&gt;
:* Navigate to gp120.nexus and load it&lt;br /&gt;
&#039;&#039;&#039;Fit 56 models&#039;&#039;&#039;&lt;br /&gt;
:* Analysis -&amp;gt; Compute likelihood scores&lt;br /&gt;
:* Select &amp;quot;7&amp;quot; under &amp;quot;Number of substitution schemes&amp;quot;&lt;br /&gt;
:* Select &amp;quot;Fixed BIONJ-JC&amp;quot; under &amp;quot;Base tree for likelihood calculations&amp;quot;&lt;br /&gt;
:* Click &amp;quot;Compute likelihoods&amp;quot;&lt;br /&gt;
&lt;br /&gt;
: This causes jmodeltest2 to perform the following actions: first a neighbor joining tree is constructed using the Jukes and Cantor model. Then the tree is fixed and used as the basis for fitting a set of 56 different models to the data. For each model, the estimated model parameters and the negative log-likelihood are recorded. In addition to varying sets of substitution rate parameters (JC, K2P, ...), some of these models also include extra parameters that take into account the presence of different rates between sites. This is done in two ways: (1) by fitting a gamma distribution of rates (&amp;quot;+G&amp;quot;), and (2) by allowing for a proportion of constant (&amp;quot;invariable&amp;quot;) sites (&amp;quot;+I&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
: Wait until jmodeltest2 is done fitting all 56 models (this will take a little while depending on your computer).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect result, manually check model probabilities for three models&#039;&#039;&#039;&lt;br /&gt;
:* Results -&amp;gt; Show results table&lt;br /&gt;
: For each model this table lists the negative log-likelihood (&amp;quot;-lnL&amp;quot;), the number of parameters (&amp;quot;p&amp;quot;), and estimates of all model parameters (excluding branch lengths).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Manually compute model probabilities for three substitution models&#039;&#039;&#039;&lt;br /&gt;
: Use AIC-based model probabilities to investigate which of the following three substitution models are best at describing how the sequences have evolved:&lt;br /&gt;
:* Jukes and Cantor with fraction of invariant sites (JC+I)&lt;br /&gt;
:* Jukes and Cantor with gamma-distributed rates over sites (JC+G)&lt;br /&gt;
:* Jukes and Cantor with invariant sites and gamma-distributed rates (JC+I+G)&lt;br /&gt;
: Before you can do the computation you need to know the log likelihood and the number of parameters for each model. Locate these values in the table for the JC+I, JC+G, and JC+I+G models, and write them down. Close the window with the result table when you are done.&lt;br /&gt;
&lt;br /&gt;
:Make sure to get the signs right: the values reported in the table are -lnL values, so you will need to reverse the sign to get the lnL (the lnL values you write down should be negative). &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Use the recipe above to compute AIC values and model probabilities. Report the results in a table similar to the one shown above&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 2&#039;&#039;&#039; Based on the model probabilities: wich model has more support?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 3&#039;&#039;&#039;&lt;br /&gt;
&#039;&#039;&#039;Use modeltest program to select best model&#039;&#039;&#039;&lt;br /&gt;
: What you just did manually for JC+I, JC+G and JC+I+G, jmodeltest2 can do automatically for the full set of 56 fitted models. Specifically, it uses the list of negative log likelihoods and parameter counts in the table to compute AIC and model probabilities, and uses this to select the model that best fits the sequence data:&lt;br /&gt;
:* Analysis -&amp;gt; Do AIC calculations -&amp;gt; &lt;br /&gt;
:* Select &amp;quot;Write PAUP* block&amp;quot;&lt;br /&gt;
:* click &amp;quot;Do AIC calculations&amp;quot;&lt;br /&gt;
:* Results -&amp;gt; Show results table&lt;br /&gt;
:* Select &amp;quot;AIC&amp;quot; tab&lt;br /&gt;
:* SHIFT+click on the header of the &amp;quot;weight&amp;quot; column. This sorts the rows according to model weight, in descending order.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What model was selected by modeltest based on the AIC values?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Construction of phylogenetic tree using PAUP ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 4&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
: Close the results table. In the main window you should now scroll up to the lines giving PAUP commands that will implement the selected model. The command is enclosed between &amp;quot;BEGIN PAUP&amp;quot; and &amp;quot;END;&amp;quot; and should look something like this:&lt;br /&gt;
 Lset Base=(0.4064, [...]&lt;br /&gt;
: You will need to copy this command to a PAUP session in the next step.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start PAUP&#039;&#039;&#039;&lt;br /&gt;
 paup&lt;br /&gt;
: Above you used jmodeltest2 to select the most suitable substitution model for the present data set. You will now use this model to construct a maximum likelihood tree. You will use PAUP for this purpose. (note: it is possible to create a maximum likelihood or a model-averaged tree directly from the jmodeltest2 program, but we will instead do it in PAUP in order to more clearly see each step that is taken).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Load alignment:&#039;&#039;&#039;&lt;br /&gt;
 execute gp120.nexus&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Set tree-building criterion to maximum likelihood&#039;&#039;&#039;&lt;br /&gt;
 set criterion=likelihood&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Set model parameters to winning estimates&#039;&#039;&#039;&lt;br /&gt;
: Above you located a set of lines in the jmodeltest2 output giving a PAUP command that sets the model parameters to the estimates that were found using the winning model. Copy and paste this lset command (without the BEGIN and END parts) into the window where PAUP is running.&lt;br /&gt;
&lt;br /&gt;
 PASTE LSET COMMAND FROM MODELTEST RUN HERE&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find best tree using selected model&#039;&#039;&#039;&lt;br /&gt;
: Still in the PAUP-window, enter the following command&lt;br /&gt;
 hsearch swap=tbr start=nj&lt;br /&gt;
: This command causes PAUP to perform a heuristic search for the best maximum likelihood tree. Once an initial tree has been constructed, the heuristic search proceeds by rearrangements of the &amp;quot;tree bisection and reconnection&amp;quot; type (TBR). We are using the model selected by modeltest, AND the parameter estimates found by modeltest on that model. You could also have chosen to simply estimate all the model parameters as part of this step (i.e., at the same time as finding the best tree), but fixing them improves speed tremendously. Findind the best tree should take a few minutes.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Save best tree to file&#039;&#039;&#039;&lt;br /&gt;
 savetrees format=newick brlens=yes file=gp120tree.phy from=1 to=1&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Quit program&#039;&#039;&#039;&lt;br /&gt;
 quit&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Have a look at the tree:&#039;&#039;&#039;&lt;br /&gt;
: You have now produced an unrooted tree of the HIV sequences and saved it in the file gp120tree.phy. Note that in this exercise we will not be interested in the tree as such - our focus is instead on finding positive selection on a subset of codon positions and the tree is just something we need in order to be able to fit the different codon models to the data. If you want to see the tree, you can do so with the following command:&lt;br /&gt;
 figtree gp120tree.phy &amp;amp;&lt;br /&gt;
: There is no meaningful root placed in this tree, so you may want to choose the unrooted view (the third icon in the Layout section of the figtree window). Close the figtree window when you have had a look&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What is the negative log likelihood of the tree you just found?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Detection of positively selected sites in gp120 ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 5&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
: There is much more to phylogenetic analyses than merely reconstructing trees. One interesting result of probabilistic methods, is that the parameters of a model will have their values determined as part of the optimization procedure. This means that once such a model has been fitted to the data, it is possible to investigate these estimated parameter values to learn features about the evolutionary history of the sequences under investigation. In the present example we will focus on investigating whether we can find positively selected sites in our data set, defined as sites where the dN/dS ratio is larger than 1. We do that by using a codon substitution model where the dN/dS ratio is one of its parameters.&lt;br /&gt;
&lt;br /&gt;
: A further strength of the probabilistic approach is that you get a probabilistic measure of how well any model fits the data. This means you can use a stringent approach to determine which model fits the data best. In this framework one uses likelihoods (probabilities of data given model) to determine which model fits the data best. As you saw above, it is for instance possible to compute AIC values and model probabilities from the likelihood values of fitted models, Since each model essentially corresponds to a hypothesis about the evolutionary history of the data, we can thus use a stringent statistical approach to decide which hypothesis best describes our data.&lt;br /&gt;
&lt;br /&gt;
: In outline, you will now use the following steps to investigate whether there is any evidence for positively selected codons in your data set:&lt;br /&gt;
&lt;br /&gt;
:* Fit model M1, which assumes there are two classes of codons in the sequence: some with dN/dS &amp;lt; 1, some with dN/dS=1.&lt;br /&gt;
:* Fit model M2, which assumes 3 distinct classes of codons: two with dN/dS ratios as for M1, and one extra class with dN/dS &amp;gt; 1.&lt;br /&gt;
:* Assess the strength of evidence for the two models using AIC-based model probabilities&lt;br /&gt;
:* If M2 is better: identify the positively selected codons&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect the parameter file&#039;&#039;&#039;&lt;br /&gt;
 nedit codeml.ctl &amp;amp;&lt;br /&gt;
: The file &amp;quot;codeml.ctl&amp;quot; contains several settings that are relevant for running the program &#039;&#039;&#039;codeml&#039;&#039;&#039;. Find the following lines and ensure that the file contains these values:&lt;br /&gt;
 &#039;&#039;&#039;seqfile =  gp120align.fasta&#039;&#039;&#039;:  name of alignment file&lt;br /&gt;
 &#039;&#039;&#039;treefile =  gp120tree.phy&#039;&#039;&#039;: name of tree file&lt;br /&gt;
 &#039;&#039;&#039;seqtype = 1&#039;&#039;&#039;: tells the program that our data consists of coding DNA.&lt;br /&gt;
 &#039;&#039;&#039;NSsites = 1 2&#039;&#039;&#039; : tells the program to analyze models M1 and M2.&lt;br /&gt;
 &#039;&#039;&#039;cleandata = 1&#039;&#039;&#039;: tells the program to ignore positions with gaps.&lt;br /&gt;
&lt;br /&gt;
: The settings entered by us will cause codeml to analyze two hypotheses about dN/dS ratios. M1 says there are two classes of codons with different dN/dS ratios in the sequence: one class with dN/dS &amp;lt; 1 (codons under purifying or negative selection), and one class with dN/dS=1 (no selection - neutrally evolving sites). M2 says there are 3 distinct dN/dS ratios for different sites in the sequence: one class with dN/dS &amp;lt; 1, one class with dN/dS=1 (these are the same type of classes as for M1), and one class with dN/dS &amp;gt; 1 (corresponding to sites under positive selection). The value of the dN/dS ratios (for those classes that have dN/dS &amp;lt; 1 or dN/dS &amp;gt; 1), the fraction of sites belonging to each class, and the position of sites belonging to each class, are unknown at first and will be determined during the analysis.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start the analysis&#039;&#039;&#039;&lt;br /&gt;
 codeml&lt;br /&gt;
: This will start the codeml program using the settings in the file codeml.ctl. Depending on your computer, this will take some minutes to finish. (You may be able to see how the optimization procedure results in progressively better fits: the likelihood increases, meaning that negative log-likelihood decreases, as the fit improves).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect result file:&#039;&#039;&#039;&lt;br /&gt;
: Wait for the run to finish, and then look at the result file:&lt;br /&gt;
 nedit selection.results &amp;amp;&lt;br /&gt;
: This file contains a wealth of information concerning your analysis. The top part of the file gives an overview of your sequences, codon usage and nucleotide frequencies. You can ignore this information for now, and move on to the interesting part, namely the model likelihoods and parameter values:&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find likelihood, and number of free parameters for model M1&#039;&#039;&#039;&lt;br /&gt;
 Search ==&amp;gt; Find... ==&amp;gt; enter &amp;quot;Model 1&amp;quot; and click Find&lt;br /&gt;
: You are now in the region of the result file where the model likelihoods and parameter estimates are noted. Now, locate a line that looks a bit like the one shown below:&lt;br /&gt;
 lnL(ntime: 72  np: 74):  -4242.470345     +0.000000&lt;br /&gt;
: Identify the number of &amp;quot;free parameters&amp;quot;, K, used in model M1: This is indicated by &amp;quot;np&amp;quot;, and is 74 in the example shown above (most of these parameters are branch lengths in the tree; specifically, the number of branch length parameters is indicated by &amp;quot;ntime&amp;quot;, and is 72 in this example). Also note the log-likelihood of the fitted model. This is the number right after the parenthesis, and is -4242.470345 in the example here.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What are the values of K and lnL for model M1?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 6&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find dn/dS ratios and codon class proportions for model M1:&#039;&#039;&#039;&lt;br /&gt;
: Scroll down a few lines until you get to a small table similar to this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
      dN/dS for site classes (K=2)&lt;br /&gt;
      p:   0.75111  0.24889&lt;br /&gt;
      w:   0.06583  1.00000&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
: This gives a summary of the dN/dS ratios that were found in the data set. The line starting w: lists the two dN/dS ratios that were found (in this case 0.06583 and 1.00000 - the last one was pre-specified by us as part of the model and was therefore not a free parameter). The line starting p: gives the proportion of codon sites belonging to each of the dN/dS ratio classes (in the example above approximately 75% belong to the first class , while 25% of all sites belong to the class having dN/dS=1.00000).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What are the dN/dS value (w) and proportion (p) of sites for both classes. Report the following values: p(class1), w(class1), p(class2), w(class2)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 7&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find likelihood, and K for model M2&#039;&#039;&#039;&lt;br /&gt;
: Scroll past the M1 output until you get to the results for model M2.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What are the values of K and lnL for model M2?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 8&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find dn/dS ratios and codon class proportions for model M2:&#039;&#039;&#039;&lt;br /&gt;
: Now, scroll down a few lines until you get to a small table similar to the one you examined for M1 before. For this model there are 3 separate classes of codons.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What are the dN/dS value (w) and proportion (p) of sites for all three classes? Report these values: p(class1), w(class1), p(class2), w(class2), p(class3), w(class3)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 9&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Assess strength of evidence for models M1 and M2:&#039;&#039;&#039;&lt;br /&gt;
: M2 will always have a better (higher) log-likelihood than model M1 because M2 has more free parameters, and M1 is nested within M2. You should now use the recipe given above to compute AIC values and model probabilities for M1 and M2.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Report: AIC, ΔAIC, w (model probability) for M1 and M2&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 10: &#039;&#039;&#039; Is M2 better than M1?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 11&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Examine list of positively selected sites&#039;&#039;&#039;&lt;br /&gt;
: If your M2 is clearly better than M1 (I firmly believe it should be if you did things according to instructions...), then you have evidence for the existence of positively selected sites in the gp120 gene. Now, scroll down to the end of the result file and locate a list similar to the one below. Note: This is the &amp;quot;Bayes Empirical Bayes&amp;quot; table, not the &amp;quot;Naive Empirical Bayes&amp;quot; table. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Bayes Empirical Bayes (BEB) analysis&lt;br /&gt;
Positively selected sites&lt;br /&gt;
&lt;br /&gt;
         Prob(w&amp;gt;1)     mean w&lt;br /&gt;
&lt;br /&gt;
    25 A 0.959*        3.133 +- 0.769&lt;br /&gt;
    27 P 0.906         2.990 +- 0.877&lt;br /&gt;
    56 K 0.987*        3.197 +- 0.687&lt;br /&gt;
    59 V 0.915         3.032 +- 0.873&lt;br /&gt;
    78 R 0.637         2.351 +- 1.129&lt;br /&gt;
    88 K 0.573         2.148 +- 1.077&lt;br /&gt;
    95 V 0.925         3.046 +- 0.843&lt;br /&gt;
    ...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
: It is not important what the distinction is in this context, but very briefly NEB ignores the fact that there is uncertainty about  maximum likelihood estimates, especially for smaller data sets (for instance w for some codon is perhaps not exactly 3.046, but could be in a region around that value), while [https://pubmed.ncbi.nlm.nih.gov/15689528/ BEB accounts for that uncertainty].&lt;br /&gt;
: This gives you a list of which residues (if any) that were found to belong to the positively selected dN/dS-class. Also listed is the probability that the site really is in the codon class where dN/dS &amp;gt; 1, and a weighted average of the w at the site. Using only DNA sequences you have now identified likely epitopes on the gp120 protein.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;List all sites having more than 95% probability of belonging to the positively selected class&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Bayesian_Phylogeny&amp;diff=31</id>
		<title>Bayesian Phylogeny</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Bayesian_Phylogeny&amp;diff=31"/>
		<updated>2024-03-19T13:41:15Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;This exercise is part of the course  Computational Molecular Evolution (22115).   == Overview ==  Today&amp;#039;s exercise will focus on phylogenetic analysis using Bayesian methods.  As was the case for likelihood methods, Bayesian analysis is founded on having a probabilistic model of how the observed data is produced. (This means that, for a given set of parameter values, you can compute the probability of any possible data set)....&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This exercise is part of the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
Today&#039;s exercise will focus on phylogenetic analysis using Bayesian methods.&lt;br /&gt;
&lt;br /&gt;
As was the case for likelihood methods, Bayesian analysis is founded on having a probabilistic model of how the observed data is produced. (This means that, for a given set of parameter values, you can compute the probability of any possible data set). You will recall from the lecture that in Bayesian statistics the goal is to obtain a full probability distribution over all possible parameter values. To find this so-called posterior probability distribution requires combining the likelihood and the prior probability distribution.&lt;br /&gt;
&lt;br /&gt;
The prior probability distribution shows your beliefs about the parameters before seeing any data, while the likelihood shows what the data is telling about the parameters. Specifically, the likelihood of a parameter value is the probability of the observed data given that parameter value. (This is the measure we have previously used to find the maximum likelihood estimate). If the prior probability distribution is flat (i.e., if all possible parameter values have the same prior probability) then the posterior distribution is simply proportional to the likelihood distribution, and the parameter value with the maximum likelihood then also has the maximum posterior probability. However, even in this case, using a Bayesian approach still allows one to interpret the posterior as a probability distribution. If the prior is NOT flat, then it may have a substantial impact on the posterior although this effect will diminish with increasing amounts of data. A prior may be derived from the results of previous experiments. For instance one can use the posterior of one analysis as the prior in a new, independent analysis.&lt;br /&gt;
&lt;br /&gt;
In Bayesian phylogeny the parameters are of the same kind as in maximum likelihood phylogeny. Thus, typical parameters include tree topology, branch lengths, nucleotide frequencies, and substitution model parameters such as for instance the transition/transversion ratio or the gamma shape parameter. The difference is that while we want to find the best point estimates of parameter values in maximum likelihood, the goal in Bayesian phylogeny is instead to find a full probability distribution over all possible parameter values. The observed data is again usually taken to be the alignment, although it would of course be more reasonable to say that the sequences are what have been observed (and the alignment should then be inferred along with the phylogeny).&lt;br /&gt;
&lt;br /&gt;
In this exercise we will explore how one can determine and use posterior probability distributions over trees, over clades, and over substitution parameters. We will also touch upon the difference between marginal and joint probability distributions.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
&lt;br /&gt;
: In the command below: Instead of /path/to/molevol enter the path to the directory where you have placed your course files (for instance cd /Users/bob/Documents/molevol, or cd /home/student/molevol).&lt;br /&gt;
 cd /path/to/molevol&lt;br /&gt;
 mkdir bayes&lt;br /&gt;
 cd bayes&lt;br /&gt;
 cp ../data/primatemitDNA.nexus ./primatemitDNA.nexus&lt;br /&gt;
 cp ../data/neanderthal.nexus ./neanderthal.nexus&lt;br /&gt;
 cp ../data/hcvsmall.nexus ./hcvsmall.nexus&lt;br /&gt;
&lt;br /&gt;
: You have analyzed (versions of) all these data files previously in this course. We will now use Bayesian phylogenetic analysis to complement what we learned in those analyses.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Load R libraries&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
: In RStudio: set the working directory to the bayes directory. Then issue these commands:&lt;br /&gt;
 library(magrittr)&lt;br /&gt;
 library(tidyverse)&lt;br /&gt;
 library(bayesplot)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Posterior probability of trees ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 1&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
: In today&#039;s exercise we will be using the program &amp;quot;MrBayes&amp;quot; to perform Bayesian phylogenetic analysis. MrBayes is a program that, like PAUP*, can be controlled by giving commands at a command line prompt. In fact, there is a substantial overlap between the commands used to control MrBayes and the PAUP command language. This should be a help when you are trying to understand how to use the program.&lt;br /&gt;
&lt;br /&gt;
: Note that the command &amp;quot;help&amp;quot; will give you a list of all available commands. Issuing &amp;quot;help &#039;&#039;command&#039;&#039;&amp;quot; will give you a more detailed description of the specified command along with current option values. This is similar to how &amp;quot;help &#039;&#039;command&#039;&#039;&amp;quot; works in PAUP.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start program&#039;&#039;&#039;&lt;br /&gt;
 mb&lt;br /&gt;
: This starts the program, giving you a prompt (&amp;quot;MrBayes&amp;gt; &amp;quot;) where you can enter commands.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Get a quick overview of available commands&#039;&#039;&#039;&lt;br /&gt;
 help&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Load your sequences&#039;&#039;&#039;&lt;br /&gt;
 execute primatemitDNA.nexus&lt;br /&gt;
: This file contains mitochondrial DNA sequences from 5 different primates. Note that MrBayes accepts input in nexus format, and that this is the same command that was used to load sequences in PAUP*. In general, you can use many of the PAUP commands in MrBayes also.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect data set&#039;&#039;&#039;&lt;br /&gt;
 showmatrix&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Define outgroup&#039;&#039;&#039;&lt;br /&gt;
 outgroup Gibbon&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Specify your model of sequence evolution&#039;&#039;&#039;&lt;br /&gt;
 lset nst=2 rates=gamma&lt;br /&gt;
: This command is again very much like the corresponding one in PAUP. You are specifying that you want to use a model with two substitution types (nst=2), and this is automatically taken to mean that you want to distinguish between transitions and transversions. Furthermore, rates=gamma means that you want the model to use a gamma distribution to account for different rates at different sites in the sequence.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start Markov chain Monte Carlo sampling&#039;&#039;&#039;&lt;br /&gt;
:Make sure to make the shell window as wide as possible and then issue the following commands to start the run:&lt;br /&gt;
 mcmc ngen=1000000 samplefreq=1000 nchains=3 diagnfreq=5000&lt;br /&gt;
: What you are doing here is to use the method known as MCMCMC (&amp;quot;Metropolis-coupled Markov chain Monte Carlo&amp;quot;) to empirically determine the posterior probability distribution of trees, branch lengths and substitution parameters. Recall that in the Bayesian framework this is how we learn about parameter values: instead of finding the best point estimates, we typically want to quantify the probability of the entire range of possible values. An estimate of the time left is shown in the last column of output.&lt;br /&gt;
&lt;br /&gt;
: Let us examine the command in detail. First, ngen=1000000 samplefreq=1000 lets the search run for 1,000,000 steps (&amp;quot;generations&amp;quot;) and saves parameter values once every 1000 rounds (meaning that a total of 1000 sets of parameter values will be saved). The option nchains=3 means that the MCMCMC sampling uses 3 parallel chains (but see below): one &amp;quot;cold&amp;quot; from which sampling takes place, and two &amp;quot;heated&amp;quot; that move around in the parameter space more quickly to find additional peaks in the probability distribution.&lt;br /&gt;
&lt;br /&gt;
: The option diagnfreq=5000 has to do with testing whether the MrBayes run is succesful. Briefly, MrBayes will start two entirely independent runs starting from different random trees. In the early phases of the run, the two runs will sample very different trees but when they have reached convergence (when they produce a good sample from the posterior probability distribution), the two tree samples should be very similar. Every diagnfreq generations, the program will compute a measure of how similar the tree-samples are (specifically, the measure is the average standard deviation of split frequencies. A &amp;quot;split&amp;quot; is the same as a bipartition, i.e., a division of all leaves in the tree in two groups, by cutting an internal branch). As a rule of thumb, you may want to run until this value is less than 0.05 (the smaller the better).&lt;br /&gt;
&lt;br /&gt;
: During the run you will see reports about the progress of the two sets of four chains. Each line of output lists the generation number and the log likelihoods of the current tree/parameter combination for each of the two groups of three chains (a column of asterisks separate the results for the independent runs). The cold chains are the ones enclosed in brackets [...], while the heated chains are enclosed in parentheses (...). Occasionally the chains will swap so one of the heated chains now becomes cold (and sampling then takes place from this chain).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Continue run until parallel runs converge on same solution&#039;&#039;&#039;&lt;br /&gt;
:At the end of the run, Mrbayes will print the average standard deviation of split frequencies (which is a measure of how similar the tree samples of the two independent runs are). We recommend that you continue with the analysis until the value gets below 0.01 (if the value is larger than 0.01 then you should answer &amp;quot;yes&amp;quot; when the program asks &amp;quot;Continue the analysis? (yes/no)&amp;quot;.)&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Once you have reached convergence (and answered &amp;quot;no&amp;quot; to continue the analysis): How many generations did you have to run?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 2&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Have a look at the resulting sample files&#039;&#039;&#039;&lt;br /&gt;
: Open a new Terminal and cd to the bayes directory. Open one of the parameter sampling files in an nedit window:&lt;br /&gt;
 nedit primatemitDNA.nexus.run1.p &amp;amp;&lt;br /&gt;
: This file contains one line for each sampled point (you may want to turn off line-wrapping in nedit under the preferences menu). Each row corresponds to a certain sample time (or generation). Each column contains the sampled values of one specific parameter. The first line contains headings telling what the different columns are: &amp;quot;lnL&amp;quot; is the log likelihood of the current parameter estimates, &amp;quot;TL&amp;quot; is the tree length (sum of all branch lengths), &amp;quot;kappa&amp;quot; is the transition/transversion rate ratio, &amp;quot;pi(A)&amp;quot; is the frequency of A (etc.), and &amp;quot;alpha&amp;quot; is the shape parameter for the gamma distribution. (Column headings may be shifted relative to their corresponding columns). Note how the values of most parameters change a lot during the initial &amp;quot;burnin&amp;quot; period, before they settle near their most probable values. Now, close the nedit window and have a look at the file containing sampled trees:&lt;br /&gt;
 nedit primatemitDNA.nexus.run1.t &amp;amp;&lt;br /&gt;
: Tree topology is also a parameter in our model, and exactly like for the other parameters we also get samples from tree-space. One tree is printed per line in the parenthetical format used by most phylogeny software. There are 5 taxa in the present data set, meaning that the tree-space consists of only 15 different possible trees. Since we have taken more than 15 sample points, there must be several lines containing the same tree topology. Close the nedit window when you are done.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Examine MCMC trajectory for nucleotide frequency&#039;&#039;&#039;&lt;br /&gt;
: Recall, that the idea in MCMCMC sampling is to move around in parameter space in such a way that the points will be visited according to their posterior probability (i.e., a region with very high posterior probability will be visited frequently). Now, in RStudio plot the sampled values for the frequency of A for one of the run files:&lt;br /&gt;
 df = read_tsv(&amp;quot;primatemitDNA.nexus.run1.p&amp;quot;, skip=1)&lt;br /&gt;
 mcmc_trace(df, pars=&amp;quot;pi(A)&amp;quot;)&lt;br /&gt;
: mcmc_trace is one of several plotting commands available in the bayesplot package. These commands produce a plot of f_A (or &amp;quot;pi(A)&amp;quot;) from the sample file for the first of the two parallel runs. Note how the Markov chain starts at the arbitrary value of 0.25, rapidly moves to a value that fits with the observed data, and then moves around in parameter space, sampling different possible values of f_A. You can experiment with plotting other columns as well.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Investigate posterior probability distribution over trees&#039;&#039;&#039;&lt;br /&gt;
: MrBayes provides the sumt command to summarize the sampled trees. Before using it, we need to decide on the burn-in: The burn-in is the initial set of samples that are typically discarded, because we want to ensure that the MCMC has moved away from the random starting values, and has found the peaks of the probability landscape. Since the convergence diagnostic we used previously to determine when to stop the analysis discarded the first 25% of the samples, it makes sense to also discard 25% of the samples obtained during the analysis.&lt;br /&gt;
&lt;br /&gt;
: Return to the shell window where you have MrBayes running. In the command below relburnin=yes and burninfrac=0.25 tells MrBayes to discard 25% of the samples as burnin (you could also have explicitly given the number of samples to discard - help sumt will give you details about the command and the current option settings).&lt;br /&gt;
 sumt contype=halfcompat conformat=simple relburnin=yes burninfrac=0.25 showtreeprobs=yes&lt;br /&gt;
: (Scroll back so you can see the top of the output when the command is done). This command gives you a summary of the trees that are in the file you examined manually above. The option contype=halfcompat requests that a majority rule consensus tree is calculated from the set of trees that are left after discarding the burnin. This consensus is the first tree plotted to the screen. Below the consensus cladogram, a consensus phylogram is plotted. The branch lengths in this have been averaged over the trees in which that branch was present (a particular branch corresponds to a bi-partition of the data, and will typically not be present in every sampled tree). The cladogram also has &amp;quot;clade credibility&amp;quot; values. We will return to the meaning of these later in today&#039;s exercise.&lt;br /&gt;
&lt;br /&gt;
: What most interests us right now is the list of trees that is printed after the phylogram. These trees are labeled &amp;quot;Tree 1&amp;quot;, &amp;quot;Tree 2&amp;quot;, etc, and are sorted according to their posterior probability which is indicated by a lower-case p after the tree number. (The upper-case P gives the cumulated probability of trees shown so far, and is useful for constructing a credible set). This list highlights how Bayesian phylogenetic analysis is different from maximum likelihood: Instead of finding the best tree(s), we now get a full list of how probable any possible tree is.&lt;br /&gt;
&lt;br /&gt;
: The list of trees and probabilities was printed because of the option showtreeprobs=yes. Note that you probably do not want to issue that command if you have much more than 5 taxa! In that case you could instead inspect the file named primatemitDNA.nexus.trprobs which is now present in the same directory as your other files (this file is automatically produced by the sumt command).&lt;br /&gt;
&lt;br /&gt;
: &#039;&#039;&#039;NOTE&#039;&#039;&#039;: Annoyingly, there is a bug in the version of mrbayes we are using here, which means leaf names are not printed on the list of trees with probabilities. However, the most probable tree here in fact is identical to the consensus tree printed above it.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What is the posterior probability of the most probable tree?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Analysis of Neanderthal data (posterior probability of clades) ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 3&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
The predominant theory in the 1950s and 60s (although it varied greatly from scholar to scholar) was that our earliest hominid ancestors (specifically Homo erectus) evolved in Africa and then radiated out into the world. This so-called [https://www.thoughtco.com/multiregional-hypothesis-167235 Multiregional Hypothesis] says that after H. erectus arrived in the various regions in the world hundreds of thousands of years ago, they slowly evolved into modern humans. The hypothesis thus posits that there were nearly independent origins of modern humans within the various regions of the world.&lt;br /&gt;
&lt;br /&gt;
In the 1970s, paleontologist W.W. Howells proposed an alternate theory: the first Recent African Origin model. Howells argued that H. sapiens evolved solely in Africa. By the 1980s, growing data from human genetics led Stringer and Andrews to develop a model that said that the very earliest anatomically modern humans arose in Africa about 100,000 years ago and archaic populations found throughout Eurasia (including Neanderthals) might be descendants of H. erectus and later archaic types but they were not related to modern humans.&lt;br /&gt;
&lt;br /&gt;
We will use the present data set to consider this issue.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Load Neanderthal data set&#039;&#039;&#039;&lt;br /&gt;
: In the Terminal where you have MrBayes running:&lt;br /&gt;
 execute neanderthal.nexus&lt;br /&gt;
 delete 5-40&lt;br /&gt;
: As we did for the maximum likelihood analysis, we will discard some of the human sequences in order to speed up the analysis. The command delete 5-40 removes sequence number 5 to sequence number 40 from the active data set.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Investigate data&#039;&#039;&#039;&lt;br /&gt;
 showmatrix&lt;br /&gt;
: This data set consists of an alignment of mitochondrial DNA from human (17 sequences), chimpanzee (1 sequence), and Neanderthal (1 sequence). The Neanderthal DNA was extracted from archaeological material, specifically bones found at Vindija in Croatia.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start analysis&#039;&#039;&#039;&lt;br /&gt;
 outgroup Pan_troglodytes&lt;br /&gt;
 lset nst=mixed rates=gamma&lt;br /&gt;
 mcmc ngen=500000 nchains=3 diagnfreq=10000&lt;br /&gt;
&lt;br /&gt;
: Here we use the command `nst=mixed` which allows MrBayes to automatically explore all possible substitution models. Essentially, MrBayes now considers the substitution model as one more parameter, and uses MCMC to sample from the possible versions (with nst ranging from 1 to 6). This will often be the best choice when using MrBayes. (Below, I use nst=6 for pedagogical purposes, because it makes it simpler to analyse the output files).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find posterior probability of clades&#039;&#039;&#039;&lt;br /&gt;
 sumt contype=halfcompat showtreeprobs=no relburnin=yes burninfrac=0.25&lt;br /&gt;
: Examine the consensus tree that is plotted to screen: On the branches that are resolved, you will notice that numbers have been plotted. These are clade-credibility values, and are in fact the posterior probability that the clade is real (based on the present data set). These numbers are different from bootstrap values: unlike bootstrap support (which have no clear statistical meaning) these are actual probabilities. Furthermore, they have been found using a full probabilistic model, instead of neighbor joining, and have still finished in a reasonable amount of time. These features make Bayesian phylogeny very useful for assessing hypotheses about monophyly.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What is the clade probability for Homo sapiens being a monophyletic group excluding the Neanderthal?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Probability distributions over other parameters ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 4&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
: As the last thing, we will now turn away from the tree topology, and instead examine the other parameters that also form part of the probabilistic model. We will do this using a reduced version of the Hepatitis C virus data set that we have examined previously. Stay in the shell window where you just performed the analysis of Neanderthal sequences.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Load data set&#039;&#039;&#039;&lt;br /&gt;
 execute hcvsmall.nexus&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Define site partition&#039;&#039;&#039;&lt;br /&gt;
 charset 1stpos=1-.\3&lt;br /&gt;
 charset 2ndpos=2-.\3&lt;br /&gt;
 charset 3rdpos=3-.\3&lt;br /&gt;
 partition bycodon = 3:1stpos,2ndpos,3rdpos&lt;br /&gt;
 set partition=bycodon&lt;br /&gt;
 prset ratepr=variable&lt;br /&gt;
: This is an alternative way of specifying that different sites have different rates. Instead of using a gamma distribution and learning which sites have what rates from the data, we are instead using our prior knowledge about the structure of the genetic code to specify that all 1st codon positions have the same rate, all 2nd codon positions have the same rate, and all 3rd codon positions have the same rate. Specifically, charset 1stpos=1-.\3 means that we define a character set named &amp;quot;1stpos&amp;quot; which includes site 1 in the alignment followed by every third site (&amp;quot;\3&amp;quot;, meaning it includes sites 1, 4, 7, 11, ...) until the end of the alignment (here denoted &amp;quot;.&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Specify model&#039;&#039;&#039;&lt;br /&gt;
 lset nst=6&lt;br /&gt;
: This specifies that we want to use a model of the General Time Reversible (GTR) type, where all 6 substitution types have separate rate parameters.&lt;br /&gt;
&lt;br /&gt;
: When the lset command was discussed previously, a few issues were glossed over. Importantly, and unlike PAUP, the lset command in MrBayes gives no information about whether nucleotide frequencies are equal or not, and whether they should be estimated from the data or not. In MrBayes this is instead controlled by defining the prior probability of the nucleotide frequencies (the command prset can be used to set priors). For instance, a model with equal nucleotide frequencies corresponds to having prior probability 1 (one) for the frequency vector (A=0.25, C=0.25, G=0.25, T=0.25), and zero prior probability for the infinitely many other possible vectors. As you will see below, the default prior is not this limited, and the program will therefore estimate the frequencies from the data.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect model details&#039;&#039;&#039;&lt;br /&gt;
 showmodel&lt;br /&gt;
: This command gives you a summary of the current model settings. You will also get a summary of how the prior probabilities of all model parameters are set. You will for instance notice that the nucleotide frequencies (parameter labeled &amp;quot;Statefreq&amp;quot;) have a &amp;quot;Dirichlet&amp;quot; prior. We will not go into the grisly details of what exactly the Dirichlet distribution looks like, but merely note that it is a distribution over many variables, and that depending on the exact parameters the distribution can be more or less flat. The Dirichlet distribution is a handy way of specifying the prior probability distribution of nucleotide (or amino acid) frequency vectors. The default statefreq prior in MrBayes is the flat or un-informative prior dirichlet(1,1,1,1).&lt;br /&gt;
&lt;br /&gt;
: We will not go into the priors for the remaining parameters in any detail, but you may notice that by default all topologies are taken to be equally likely (a flat prior on trees).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start MCMC sampling&#039;&#039;&#039;&lt;br /&gt;
 mcmc ngen=500000 samplefreq=100 diagnfreq=10000 nchains=3&lt;br /&gt;
: The run will take a while to finish (you may want to ensure that the average standard deviation of split frequencies is less than 0.01 before ending the analysis).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Compute summary of parameter values&#039;&#039;&#039;&lt;br /&gt;
 sump relburnin=yes burninfrac=0.25&lt;br /&gt;
: The sump command works much like the sumt command for the non-tree parameters. Again, we are using 25% of the total number of samples as burnin.&lt;br /&gt;
&lt;br /&gt;
: First, you get a plot of the lnL as a function of generation number. Values from the two independent runs are labeled &amp;quot;1&amp;quot; and &amp;quot;2&amp;quot; respectively. If the burnin is suitable, then the points should be randomly scattered over a narrow lnL interval.&lt;br /&gt;
&lt;br /&gt;
: Secondly, the posterior probability distribution of each parameter is summarized by giving the mean, variance, median, and 95% credible interval.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Report the mean of the relative substitution rate parameters r(AC) and r(CG).&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 5: &#039;&#039;&#039; Based on the reported posterior means, does it seem that r(CG) is different from r(AC)?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 6&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Marginal vs. joint distributions&#039;&#039;&#039;&lt;br /&gt;
: Strictly speaking the comparison above was not entirely appropriate. We first found the overall distribution of the r(CG) parameter and then compared its mean to the mean of the overall distribution of the r(AC) parameter. By doing things this way, we are ignoring the possibility that the two parameters might be associated in some way. For instance, one parameter might always be larger than the other in any individual sample, even though the total distributions overlap. We should instead be looking at the distribution over both parameters simultaneously. A probability distribution over several parameters simultaneously is called a &amp;quot;joint distribution&amp;quot; over the parameters.&lt;br /&gt;
&lt;br /&gt;
: By looking at one parameter at a time, we are summing its probability over all values of the other parameters. This is called the marginal distribution.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Examine marginal distributions&#039;&#039;&#039;&lt;br /&gt;
: In RStudio, use the following commands to read and plot the marginal distributions of r(AC) and r(CG). Note that we are discarding the first 25% of the reads as burnin&lt;br /&gt;
 df = read_tsv(&amp;quot;hcvsmall.nexus.run1.p&amp;quot;, skip=1)&lt;br /&gt;
 burnin = df$Gen %&amp;gt;% &lt;br /&gt;
     max() %&amp;gt;% &lt;br /&gt;
     multiply_by(0.25) %&amp;gt;% &lt;br /&gt;
     floor()&lt;br /&gt;
 df2 = df %&amp;gt;% &lt;br /&gt;
     filter(Gen &amp;gt; burnin) %&amp;gt;%&lt;br /&gt;
     select(CG = `r(C&amp;lt;-&amp;gt;G){all}`,&lt;br /&gt;
            AC = `r(A&amp;lt;-&amp;gt;C){all}`&lt;br /&gt;
            )&lt;br /&gt;
 mcmc_intervals(df2, prob_outer = 1)&lt;br /&gt;
 mcmc_areas(df2, prob_outer = 1)&lt;br /&gt;
: The functions mcmc_intervals and mcmc_areas plot different views of the same posterior distributions. &lt;br /&gt;
&lt;br /&gt;
: You can also simply plot the data using ggplot:&lt;br /&gt;
 df2long = pivot_longer(df2, cols = c(&amp;quot;CG&amp;quot;, &amp;quot;AC&amp;quot;))&lt;br /&gt;
 &lt;br /&gt;
 ggplot(df2long) + &lt;br /&gt;
     geom_density(mapping=aes(x=value, fill=name), alpha=0.3) + &lt;br /&gt;
     labs(x=&amp;quot;Substitution rate&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question&#039;&#039;&#039;: Which of the following statements best describe how the marginal distributions behave?&lt;br /&gt;
&lt;br /&gt;
:* The two marginal distributions have a small overlap. The r(CG) distribution has the highest peak.&lt;br /&gt;
:* The two marginal distributions have a large overlap. The r(CG) distribution has the highest peak.&lt;br /&gt;
:* The two marginal distributions have no overlap. The r(CG) distribution has the highest peak.&lt;br /&gt;
:* The two marginal distributions have a large overlap. The r(AC) distribution has the highest peak.&lt;br /&gt;
:* The two marginal distributions have a small overlap. The r(AC) distribution has the highest peak.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 7&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Examine joint distributions&#039;&#039;&#039;&lt;br /&gt;
: These plots and results explore the relationship between the A&amp;lt;-&amp;gt;C and C&amp;lt;-&amp;gt;G rates.&lt;br /&gt;
 ggplot(df2, aes(x=CG, y=AC)) + &lt;br /&gt;
     geom_point(col=&amp;quot;blue&amp;quot;) + &lt;br /&gt;
     geom_abline(intercept=0, slope=1, lty=2, col=&amp;quot;red&amp;quot;) +&lt;br /&gt;
     xlim(0,0.25) + &lt;br /&gt;
     ylim(0,0.25) + &lt;br /&gt;
     labs(x=&amp;quot;CG rate&amp;quot;, y =&amp;quot;AC rate&amp;quot;) &lt;br /&gt;
 &lt;br /&gt;
 ggplot(df2, aes(x=CG, y= AC)) + &lt;br /&gt;
     geom_hex(col=&amp;quot;blue&amp;quot;) + &lt;br /&gt;
     geom_abline(intercept=0, slope=1, lty=2, col=&amp;quot;red&amp;quot;) +&lt;br /&gt;
     xlim(0,0.25) + &lt;br /&gt;
     ylim(0, 0.25) + &lt;br /&gt;
     labs(x=&amp;quot;CG rate&amp;quot;, y =&amp;quot;AC rate&amp;quot;) &lt;br /&gt;
 &lt;br /&gt;
 df2 %&amp;gt;%&lt;br /&gt;
     nrow()&lt;br /&gt;
 df2 %&amp;gt;%&lt;br /&gt;
     filter(AC&amp;gt;CG) %&amp;gt;%&lt;br /&gt;
     nrow()&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Based on the above plots and results: What is the joint probability that rAC &amp;gt; rCG?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 8&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
: Note how examining the joint distribution provides you with information that you could not get from simply comparing the marginal distributions. This very simple procedure can be used to answer many different questions.&lt;br /&gt;
&lt;br /&gt;
: Now, plot the relative substitution rates at the first, second, and third codon positions:&lt;br /&gt;
 df3 = df %&amp;gt;% &lt;br /&gt;
     filter(Gen &amp;gt; 75000) %&amp;gt;%&lt;br /&gt;
     select(Codon_1st = `m{1}`,&lt;br /&gt;
            Codon_2nd = `m{2}`,&lt;br /&gt;
            Codon_3rd = `m{3}` ) %&amp;gt;%&lt;br /&gt;
     pivot_longer(cols=c(&amp;quot;Codon_1st&amp;quot;, &amp;quot;Codon_2nd&amp;quot;, &amp;quot;Codon_3rd&amp;quot;))&lt;br /&gt;
 &lt;br /&gt;
 ggplot(df3) + &lt;br /&gt;
     geom_density(mapping=aes(x=value, fill=name), alpha=0.3) + &lt;br /&gt;
     labs(x=&amp;quot;Relative substitution rate&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Since random mutations presumably hit all three codon positions with the same frequency, any differences must be caused by subsequent selection. How does the result fit with your knowledge of the structure of the genetic code? Which of the following statement are correct ? (More than one answer may be correct)&lt;br /&gt;
&lt;br /&gt;
:* Codon position 2 is the most degenerate of the codon positions.&lt;br /&gt;
:* Codon position 1 is the most degenerate of the codon positions.&lt;br /&gt;
:* Codon position 1 is the most conserved codon position.&lt;br /&gt;
:* Codon position 3 is the most conserved codon position.&lt;br /&gt;
:* Codon position 3 is the most degenerate of the codon positions.&lt;br /&gt;
:* Codon position 2 is the most conserved codon position.&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Maximum_Likelihood&amp;diff=30</id>
		<title>Maximum Likelihood</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Maximum_Likelihood&amp;diff=30"/>
		<updated>2024-03-19T13:39:51Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;This exercise is part of the course  Computational Molecular Evolution (22115).   == Overview ==  The data set you will work with here consists of an alignment of full length mitochondrial DNA from human (53 sequences), chimpanzee (1 sequence), bonobo (1 sequence), and Neanderthal (1 sequence). The Neanderthal DNA was extracted from archaeological material, specifically 38,000 year old bones found at Vindija in Croatia (all s...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This exercise is part of the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
The data set you will work with here consists of an alignment of full length mitochondrial DNA from human (53 sequences), chimpanzee (1 sequence), bonobo (1 sequence), and Neanderthal (1 sequence). The Neanderthal DNA was extracted from archaeological material, specifically 38,000 year old bones found at Vindija in Croatia (all sequence data was taken from this paper: Green et al., Cell, 2008).&lt;br /&gt;
&lt;br /&gt;
The view emerging from most anatomical, archaeological, and DNA-based studies places Neanderthals as a different species from Homo sapiens. This is in agreement with the &amp;quot;Out-of-Africa hypothesis&amp;quot;, which states that Neanderthals coexisted with modern humans who originated in Africa somewhere between 100,000 to 200,000 years ago. There is, however, also some anatomical and paleontological research which supports the so-called &amp;quot;multi-regional hypothesis&amp;quot;, which propounds that some populations of archaic Homo evolved into modern human populations in many geographical regions. Under this hypothesis, Neanderthals would be a sub-clade within human . We will use the present data set to consider this issue.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Construct working directory, copy files&#039;&#039;&#039;&lt;br /&gt;
In a terminal window enter:&lt;br /&gt;
 cd ~student&lt;br /&gt;
 mkdir likelihood&lt;br /&gt;
 cd likelihood&lt;br /&gt;
 cp ~/data/neanderthal.nexus ./neanderthal.nexus&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 1&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Start PAUP and load data set:&lt;br /&gt;
 paup neanderthal.nexus&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Remove subset of sequences to reduce computational burden:&#039;&#039;&#039;&lt;br /&gt;
 delete 5-40&lt;br /&gt;
: This command removes 36 human sequences (sequence number 5 to sequence number 40) from the data set. We do this in order to reduce the time needed to finish the analysis. In the remaining data set we now have 17 human sequences, one chimpanzee, one bonobo, and one Neanderthal.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Specify substitution model&#039;&#039;&#039;&lt;br /&gt;
In the analysis performed here, we have reason to believe that the Kimura 2 parameter model is a fair description of how the sequences evolve (i.e., transitions and transversions have separate rates). We furthermore have evidence that different sites evolve at quite different rates, and we want to model this using a gamma distribution. Moreover, we will request that the transition/transversion rate ratio and the gamma shape parameter are estimated from data. (Although we will not discuss the issue further at this point, it is important to realize that there are techniques for stringent selection of the best model, and that one should never just randomly select one. We will return to such techniques later in the course when we discuss model selection. For now, however, you should just accept that K2P + gamma is an adequate model for the present data set). To specify the substitution model, enter the following at the PAUP prompt:&lt;br /&gt;
 set criterion=likelihood&lt;br /&gt;
 lset nst=2 tratio=estimate basefreq=equal rates=gamma shape=estimate&lt;br /&gt;
: In order to search for a maximum likelihood tree, we must first give a detailed description of the assumed substitution model. Since this is the first time we do this, I will give a rather thorough description of each part of the command.&lt;br /&gt;
&lt;br /&gt;
: First lset (&amp;quot;likelihood settings&amp;quot;) is the command used in PAUP to specify likelihood models, just as dset was used to specify settings for the distance criterion.&lt;br /&gt;
&lt;br /&gt;
: Secondly, we specify that we want a model with two different types of substitution rates (nst=2) and where the frequency of each base is 25% (basefreq=equal). You will recognize this as the K2P model. Note that, by default, PAUP assumes that nst=2 means that we want to make a distinction between transitions and transversions. It is also possible to specify models with two types of substitutions that are NOT transitions and transversions respectively. One example would be: lset nst=6 rmatrix=estimate rclass=(a a a b b b). I will not explain this example in detail at this point.&lt;br /&gt;
&lt;br /&gt;
: Third, we request that the transition/transversion ratio should be estimated from the data (tratio=estimate).&lt;br /&gt;
&lt;br /&gt;
: Finally, we specify that we want to use a model where substitution rates at different sites follow a gamma distribution (rates=gamma), and that we want the shape of this distribution to also be estimated from the data (shape=estimate).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Specify outgroup and rooting options&#039;&#039;&#039;&lt;br /&gt;
 outgroup Pan_troglodytes Pan_paniscus&lt;br /&gt;
 set root=outgroup outroot=monophyl&lt;br /&gt;
: The chimpanzee and the bonobo form the outgroup&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start heuristic search of tree space using nearest neighbor interchange (NNI)&#039;&#039;&#039;&lt;br /&gt;
 hsearch swap=nni&lt;br /&gt;
: This step may take a little while to finish (depending on your computer). For large datasets you sometimes have to wait hours or even days for a maximum likelihood analysis to finish. The score that PAUP lists for maximum likelihood analysis is the &#039;&#039;negative&#039;&#039; log of the likelihood, -ln(L). (Recall that since likelihoods are numbers between 0 and 1, log-likelihoods will be negative numbers, and therefore negative log-likelihoods will be positive numbers. Perhaps a bit confusing that PAUP doesn&#039;t simply list the ln(L) ). As the likelihood increases, this number will decrease.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What is the negative log-likelihood, -ln(L), for the best tree found using NNI?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 2&#039;&#039;&#039; Is the Neanderthal sequence placed inside or outside the clade of human sequences?&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Models_of_Evolution&amp;diff=29</id>
		<title>Models of Evolution</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Models_of_Evolution&amp;diff=29"/>
		<updated>2024-03-19T13:35:13Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;This exercise is part of the course  Computational Molecular Evolution (22115).   == Overview ==  In this exercise we will explore a number of different, but closely related, models of evolution. Using such models it is possible to estimate the number of unseen mutational events and thereby obtain genetic distances that have been corrected for superimposed substitutions. It is, however, important to realize that these correct...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This exercise is part of the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
In this exercise we will explore a number of different, but closely related, models of evolution. Using such models it is possible to estimate the number of unseen mutational events and thereby obtain genetic distances that have been corrected for superimposed substitutions. It is, however, important to realize that these corrections are based on the assumption that we observe approximately the expected amount of change - if, for instance, 20 mutational events end up leading to no observable changes then it is impossible to guess the actual amount of change regardless of which correctional scheme we employ. Using more and longer sequences helps ensuring that the observed change is closer to the expected change, so the correction is more likely to be accurate with adequate amounts of data. The same models also play an important role in phylogenetic reconstruction based on maximum likelihood and Bayesian techniques.&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;1: Start Terminal window&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2: Construct working directory&#039;&#039;&#039;&lt;br /&gt;
: In the command below: Instead of /path/to/molevol enter the path to the directory where you have placed your course files (for instance cd /Users/bob/Documents/molevol, or cd /home/student/molevol).&lt;br /&gt;
 cd /path/to/molevol&lt;br /&gt;
 mkdir models&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3: Change to working directory&#039;&#039;&#039;&lt;br /&gt;
 cd models&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;4: Copy files for exercise&#039;&#039;&#039;&lt;br /&gt;
 cp ../data/primatemitDNA.nexus ./primatemitDNA.nexus&lt;br /&gt;
 cp ../data/titv.data ./titv.data&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;5: Inspect sequence file&#039;&#039;&#039;&lt;br /&gt;
 nedit primatemitDNA.nexus &amp;amp;&lt;br /&gt;
: This file contains an aligned set of mitochondrial DNA sequences from man, chimpanzee, gorilla, orangutan and gibbon. Mitochondria are cellular organelles that are bounded by a lipid membrane and contain their own genome. Mitochondrial DNA is related to certain bacterial genomes, and it is believed that the original mitochondrium was a primitive bacterial cell that was engulfed by an early ancestor of eukaryotic cells and that the pair subsequently went on to form a constant symbiotic relationship.&lt;br /&gt;
&lt;br /&gt;
: Mitochondrial DNA has a higher rate of substitution than nuclear DNA. This makes it useful for investigating phylogenetic relationships between closely related species, such as the five primates included in the present data set. Close the nedit window when you are done.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;6: Inspect additional data file&#039;&#039;&#039;&lt;br /&gt;
 nedit titv.data &amp;amp;&lt;br /&gt;
: This file contains a single header line and one column of numbers giving estimated times of divergence between man and chimpanzee, man and gorilla, man and orangutan, and man and gibbon. (Divergence times are in millions of years). This file will be used later in the exercise when we investigate how various distance measures increase over time. Note: If the nedit window is too narrow, then the column headings will wrap over two lines. Make sure to make the window as wide as possible in order to understand the structure of this file. Close the nedit window when you are done.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== The Jukes and Cantor model ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 1&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
The Jukes and Cantor model of evolution has the following rate matrix:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  | A   C   G   T |&lt;br /&gt;
-------------------&lt;br /&gt;
A | -   a   a   a | &lt;br /&gt;
  |		  |&lt;br /&gt;
C | a   -   a   a |&lt;br /&gt;
  |		  |&lt;br /&gt;
G | a   a   -   a |&lt;br /&gt;
  |		  |&lt;br /&gt;
T | a   a   a   - |&lt;br /&gt;
-------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start RStudio&#039;&#039;&#039;&lt;br /&gt;
: We will use RStudio to explore some features of evolution occurring according to this model. Start by loading the tidyverse libraries:&lt;br /&gt;
 library(tidyverse)&lt;br /&gt;
&lt;br /&gt;
For the Jukes and Cantor model the following equation gives the probability, D, that a given site will display observable change, expressed as a function of branch length, d:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;D=\frac{3}{4} \left( 1 - \exp\left(-\frac{4}{3}d\right) \right)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here, d is measured in substitutions per site. D is also the expected fraction of sites showing observable change along a branch of length d: if any single site has probability D of changing, then on average D * L sites will have changed in a sequence of length L. We will now explore how the expected amount of observable change depends on the branch length.&lt;br /&gt;
&lt;br /&gt;
In RStudio enter the following (note: you may want to enter this in the script window, in the upper left of RStudio, so you can re-use the code later on):&lt;br /&gt;
 df = tibble(&lt;br /&gt;
     d = seq(0,10,0.1),&lt;br /&gt;
     observed = 0.75*(1-exp(-1.33*d)),&lt;br /&gt;
     max = 0.75&lt;br /&gt;
 )&lt;br /&gt;
 &lt;br /&gt;
 dflong = pivot_longer(df, cols=-d)&lt;br /&gt;
 ggplot(dflong, aes(x=d, y=value, col=name)) + &lt;br /&gt;
     geom_line() + &lt;br /&gt;
     geom_abline(slope=1, lty=2) +&lt;br /&gt;
     labs(title=&amp;quot;Exp. observable differences&amp;quot;, &lt;br /&gt;
          x = &amp;quot;Actual distance (branch length)&amp;quot;,&lt;br /&gt;
          y = &amp;quot;Observed distance&amp;quot;) +&lt;br /&gt;
     ylim(0,1)&lt;br /&gt;
&lt;br /&gt;
: In this expression d is the the branch length (the actual amount of change that has occurred). The curve we have plotted thus gives the expected observed difference as a function of the actual amount of change.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Which of the following statements are true?&lt;br /&gt;
:# Sequences can become no more than 75 % different according to the Jukes and Cantor model&lt;br /&gt;
:# The graph of the observed differences plateaus at 3/4&lt;br /&gt;
:# The Jukes and Cantor correction will have a limited effect when the branch length is large&lt;br /&gt;
:# For small branch lengths, the expected observable difference rises almost linearly and is very close to the real distance&lt;br /&gt;
:# The graph of the observed differences plateaus at 1&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 2&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Jukes and Cantor model: Examine estimated branch length as a function of observed difference&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Above we examined how the (expected) observed distance depended on the real distance. We will now examine how the real distance can be estimated from the observed distance. This is done by solving the above equation for d, giving us an expression that allows us to estimate the real amount of change as a function of the observed change:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;d=-\frac{3}{4}ln\left( 1 - \frac{4}{3}D  \right)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that this correction will only work if the observed difference is approximately as expected. Consider this: In the dice-rolling simulation we found that if there has been 0.67 changes per site then the expected observed difference is 0.44. However, as you saw in the simulation, the actual observed difference can be different from the expected 0.44 (say, 0.33 or 0.58). If the observed difference is not the same as the expected observed difference, then we will obviously also get the wrong estimate of the real distance after correcting for multiple substitutions.&lt;br /&gt;
&lt;br /&gt;
We will now plot (estimated) real change as a function of observed difference (this is the inverse of what you did before). In RStudio enter:&lt;br /&gt;
 df = tibble(&lt;br /&gt;
     D = seq(0,0.749,0.01),&lt;br /&gt;
     real = -0.75*log(1-1.33*D)&lt;br /&gt;
 )&lt;br /&gt;
 &lt;br /&gt;
 ggplot(df, aes(x=D, y=real)) + &lt;br /&gt;
     geom_line(col=&amp;quot;blue&amp;quot;) + &lt;br /&gt;
     geom_abline(slope=1, lty=2) +&lt;br /&gt;
     labs(title=&amp;quot;Estimated real distance&amp;quot;, &lt;br /&gt;
          x = &amp;quot;Observed difference&amp;quot;,&lt;br /&gt;
          y = &amp;quot;Real distance&amp;quot;) + &lt;br /&gt;
     xlim(0,0.8)&lt;br /&gt;
&lt;br /&gt;
: The function &amp;quot;log&amp;quot; means the natural logarithm in R. Note how the correction becomes increasingly more important as the observed distance increases. Also note that this correction does not allow the observed distance to rise above 0.75, although that situation may arise in real data. Above 75% difference the corrected distance is not defined. When using JC corrected distances for phylogenetic reconstruction, you should therefore beware of this situation.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Use the equation above to estimate the actual distance if the observed distance is 0.1, 0.4, and 0.6 respectively&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== The Kimura 2 parameter model ==&lt;br /&gt;
&lt;br /&gt;
The Kimura 2 parameter model of evolution has the following rate matrix:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  | A   C   G   T |&lt;br /&gt;
-------------------&lt;br /&gt;
A | -   b   a   b | &lt;br /&gt;
  |		  |&lt;br /&gt;
C | b   -   b   a |&lt;br /&gt;
  |		  |&lt;br /&gt;
G | a   b   -   b |&lt;br /&gt;
  |		  |&lt;br /&gt;
T | b   a   b   - |&lt;br /&gt;
-------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note how transitions (A/G and C/T) have a different rate than transversions (A/C, A/T, C/G, and G/T). Based on this matrix, the expected ratio of transitions to transversions is: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;R = \frac{a}{2b}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
meaning that if transitions and transversions had the same rate (Jukes and Cantor), then we would expect: &amp;lt;math&amp;gt;R = 0.5&amp;lt;/math&amp;gt;. Empirically, this is typically not the case. In fact one often sees &amp;lt;math&amp;gt;R \geq 2&amp;lt;/math&amp;gt; and for mitochondrial DNA a typical value is &amp;lt;math&amp;gt;R=10&amp;lt;/math&amp;gt; (meaning that a is 20 times higher than b)! We will now use RStudio to explore some features of evolution occurring according to this model.&lt;br /&gt;
&lt;br /&gt;
It can be shown that, under the K2P model, the chance of observing a transition and a transversion respectively depends on R and t in the following way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
P_\textrm{transition} = 0.25 - 0.5  \exp(A*t) + 0.25 * \exp(B*t)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
P_\textrm{transversion} = 0.5 - 0.5 * \exp(B*t)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
A = \frac{-2R-1}{R+1}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
B = \frac{-2}{R+1}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in these equations we have chosen to measure time in suitable units such that the overall rate of substitution (&amp;lt;math&amp;gt;\mu=a+2b&amp;lt;/math&amp;gt;) has the value 1 substitution per site per unit time. (An example: if  &amp;lt;math&amp;gt;\mu=10^{-9}&amp;lt;/math&amp;gt; substitutions per site per year, then we would choose to measure time in billions of years, instead of in years. The substitution rate would now be 1 substitution per site per billion years). This means that the amount of change accumulated during t time units simply is: &amp;lt;math&amp;gt;D = 1 * t = t&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 3&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Examine expected amount of change as a function of branch length&#039;&#039;&#039;&lt;br /&gt;
We will now examine how the expected amount of transitions and transversions change with time when R=10. In the RStudio window enter the following:&lt;br /&gt;
 R = 10&lt;br /&gt;
 A = (-2*R-1.0)/(R+1.0)&lt;br /&gt;
 B = (-2)/(R+1.0)&lt;br /&gt;
You can check the computed values of A and B by using the print command:&lt;br /&gt;
 print(A)&lt;br /&gt;
 print(B)&lt;br /&gt;
You should have obtained values of approximately A = -1.909 and B = -0.1818. You can now plot the curves showing how the expected amount of transitions and transversions change as a function of the branch length (the actual amount of change):&lt;br /&gt;
 df = tibble(&lt;br /&gt;
     t = seq(0, 40, 0.1),&lt;br /&gt;
     Transitions = 0.25-0.5*exp(A*t)+0.25*exp(B*t),&lt;br /&gt;
     Transversions = 0.5 - 0.5 * exp(B*t),&lt;br /&gt;
     Total_dist = 0.25-0.5*exp(A*t)+0.25*exp(B*t) + 0.5 - 0.5 * exp(B*t)&lt;br /&gt;
 )&lt;br /&gt;
 &lt;br /&gt;
 dflong = df %&amp;gt;% pivot_longer(-t)&lt;br /&gt;
 &lt;br /&gt;
 ggplot(dflong, aes(x=t, y=value, col=name)) + &lt;br /&gt;
     geom_line() + &lt;br /&gt;
     labs(title=&amp;quot;Exp. observable differences&amp;quot;, &lt;br /&gt;
          x = &amp;quot;Real distance&amp;quot;,&lt;br /&gt;
          y = &amp;quot;Observed difference&amp;quot;) +&lt;br /&gt;
     geom_hline(yintercept = 0.25, col=&amp;quot;blue&amp;quot;, lty=2) + &lt;br /&gt;
     geom_hline(yintercept = 0.5, col=&amp;quot;blue&amp;quot;, lty=2) + &lt;br /&gt;
     geom_hline(yintercept = 0.75, col=&amp;quot;blue&amp;quot;, lty=2) + &lt;br /&gt;
     ylim(0,1)&lt;br /&gt;
&lt;br /&gt;
:Several interesting things are going on in this plot. First of all, note that I have added a third curve showing the total observed difference. This is simply the sum of the observed transitions and transversions.&lt;br /&gt;
&lt;br /&gt;
: Second, as was the case for the Jukes and Cantor model, the total observed difference increases to a maximum value of 0.75 (corresponding to 25% similarity).&lt;br /&gt;
&lt;br /&gt;
: Third, note that the expected amount of transitional differences first rise rapidly and then decline slowly to an equilibrium value of 0.25. Transversional differences rise slowly to an equilibrium value of 0.5. The equilibrium values are determined by the fact that when sufficient time has passed sequence similarities will essentially be random; since there are twice as many possible transversions as transitions, these will in the end make up two thirds of all observed changes. Early on, before this stage is reached, the much higher rate of transitions will cause them to make up the vast majority of all observed changes, and only after considerable time has elapsed will the transversions catch up.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;From the plot, estimate the real distance (x-axis) at which the transition and transversion lines cross.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 4&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Experiment with other transition/transversion rate ratios&#039;&#039;&#039;&lt;br /&gt;
The exact behaviour of the relationship between the two types of change depends on the relative rates of transition and transversion. You should now repeat the above analysis with:&lt;br /&gt;
:# R=2&lt;br /&gt;
:# R=0.5&lt;br /&gt;
Remember to recompute A and B after entering the new value of R. Recall that R=0.5 means that transitions and transversion occur with the same rate, a=b. For each of these two cases rerun the plot command and consider the changes.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Based on the two plots which of the following statements are true?&lt;br /&gt;
:# For R = 2, the transition and transversion lines cross at around 1.5 substitutions per site.&lt;br /&gt;
:# For R = 0.5, the transition and transversion lines never cross each other.&lt;br /&gt;
:# For both R=2 and R = 0.5, the transition and transversion lines never cross each other.&lt;br /&gt;
:# For R = 2, the transition and transversion lines cross at around 5 substitutions per site.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 5&#039;&#039;&#039; When R=0.5, the Kimura 2 parameter model is in fact equivalent to another model - which one?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 6&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Examine how apparent transition/transversion ratio changes with branch length&#039;&#039;&#039;&lt;br /&gt;
The apparent transition transversion ratio is simply the observed number of transitions divided by the observed number of transversions. The following plot command shows this number as a function of branch length for the case R=10 (I have simply taken the expression for observed transitions and divided it by the expression for observed transversions):&lt;br /&gt;
 df = tibble(&lt;br /&gt;
     t= seq(0.01, 4, 0.01),&lt;br /&gt;
     obsrat = (0.25-0.5*exp(-1.909*t)+0.25*exp(-0.1818*t))/ (0.5 - 0.5 * exp(-0.1818*t))&lt;br /&gt;
 )&lt;br /&gt;
 &lt;br /&gt;
 ggplot(df, aes(x=t, y=obsrat)) + &lt;br /&gt;
     geom_line(col=&amp;quot;blue&amp;quot;) + &lt;br /&gt;
     labs(x = &amp;quot;Real distance&amp;quot;,&lt;br /&gt;
          y = &amp;quot;Observed transition/transversion ratio&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
Note how the apparent ratio is close to the real ratio, R=10, when not much change has occurred (i.e., for small x).&lt;br /&gt;
&lt;br /&gt;
The model of evolution that we have explored here is not a particularly complicated one - in fact it only has two free parameters. Nevertheless, you will by now appreciate that it is capable of displaying fairly un-intuitive behaviour. Stating our hypothesis about this biological system in explicit mathematical terms is what allowed us to explore this thoroughly.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What value does the apparent transition/transversion ratio approach asymptotically? (You will need to construct a plot with a wider x-range to see this).&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Analysis of mitochondrial data set ==&lt;br /&gt;
&lt;br /&gt;
In this part of the exercise we will explore a real mitochondrial data set containing sequences from man, chimpanzee, gorilla, orangutan, and gibbon. We will investigate how the use of different models of evolution affects the estimated distance matrix. Since mitochondrial DNA is known to have very different transition and transversion rates, we will pay special attention to this aspect.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 7 &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Prepare editor window&#039;&#039;&#039;&lt;br /&gt;
In a terminal window, enter:&lt;br /&gt;
 nedit titv.data &amp;amp;&lt;br /&gt;
(Make sure to make the nedit window as wide as possible - otherwise the header line will be wrapped over two lines). This file contains a header line and a column listing the estimated divergence time between man and each of the other four primates (in millions of years). These estimates are associated with a fair amount of uncertainty, but the implied branching order is almost certainly correct. You will be using this file for entering various measures that you compute from the data file.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start PAUP*, load data file&#039;&#039;&#039;&lt;br /&gt;
 paup primatemitDNA.nexus&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Define outgroup:&#039;&#039;&#039;&lt;br /&gt;
 outgroup gibbon&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Activate outgroup rooting and select how tree will be printed:&#039;&#039;&#039;&lt;br /&gt;
 set root=outgroup outroot=monophyl&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Select distance-based tree-reconstruction:&#039;&#039;&#039; &lt;br /&gt;
 set criterion=distance&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Select uncorrected distances under the least squares criterion:&#039;&#039;&#039;&lt;br /&gt;
 dset distance=p objective=lsfit&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Short digression on PAUP* online help system:&#039;&#039;&#039;&lt;br /&gt;
: We interrupt this exercise for a brief announcement: By now you should be familiar with many of the commands used in PAUP, but you probably do not have an overview of the long list of possible options that can be specified. Fortunately, PAUP has a command that is useful in this context:&lt;br /&gt;
 dset ?&lt;br /&gt;
: Here I have used dset as an example, but typing any command followed by a question mark (&amp;quot;?&amp;quot;) will give you a list of all the possible options for that command, along with a list of the current values. This is very useful if you want to experiment with different settings in an analysis. When you want to learn more about the individual settings, you can also check the command reference and manual, which are linked on the course wiki.&lt;br /&gt;
&lt;br /&gt;
: One final thing that may be good to know: PAUP* accepts abbreviated commands as long as the abbreviation is unambiguous. That means you can for instance write set crit=dist instead of the full set criterion=distance, and desc instead of describetrees.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Construct least squares tree:&#039;&#039;&#039;&lt;br /&gt;
 alltrees&lt;br /&gt;
: This is a small data set so we can use exhaustive searching.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect tree:&#039;&#039;&#039;&lt;br /&gt;
 describetrees all/plot=phylogram&lt;br /&gt;
: This tree reflects our current belief about how these organisms are related&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Print distance matrix, note distances from human:&#039;&#039;&#039;&lt;br /&gt;
 showdist&lt;br /&gt;
: The showdist command lists the distance matrix computed according to the currently active distance-setting (as specified in the dset command above).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What are the p-distances for the following pairs of sequences: human/chimpanzee, human/gorilla, human/orangutan, human/gibbon.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: Also copy the entries giving the p-distance between human and each of the other four primates into the proper place in the titv.data file. (The numbers should all be in a single column under the &amp;quot;p_dist&amp;quot; header).&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 8&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Select uncorrected distances, counting only transitions:&#039;&#039;&#039;&lt;br /&gt;
 dset subst=ti&lt;br /&gt;
: The option subst=ti specifies that only transitional substitutions should be counted. The previously issued &amp;quot;distance=p&amp;quot; is still the active setting. You can verify this by typing &amp;quot;dset ?&amp;quot; and checking the value listed for distance.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Print distance matrix, note transitional distances from human:&#039;&#039;&#039;&lt;br /&gt;
 showdist&lt;br /&gt;
:In this distance matrix only the transitions have been counted for each pair of taxa.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039; What are the transition-distances for the following pairs of sequences: human/chimpanzee, human/gorilla, human/orangutan, human/gibbon.&lt;br /&gt;
Note: Also enter the numbers in the column labeled &amp;quot;Transitions(P)&amp;quot; in the file.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 9&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Select uncorrected distances, counting only transversions:&#039;&#039;&#039;&lt;br /&gt;
 dset subst=tv&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Print distance matrix, note transversional distances from human:&#039;&#039;&#039;&lt;br /&gt;
 showdist&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039; Again enter the distances from everything to human below (separated by spaces, and using at least two significant digits) and in the column labeled Transversions(Q) in the file.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 10&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Compute JC-corrected distances:&#039;&#039;&#039;&lt;br /&gt;
As we saw above, it is possible to come up with model-based corrections for the effect of multiple substitutions that allow us to estimate the real amount of change from the observed amount of change. For the JC-model, the equation for the corrected distance is:&lt;br /&gt;
:&amp;lt;math&amp;gt;d=-\frac{3}{4}ln\left( 1 - \frac{4}{3}D  \right)&amp;lt;/math&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;For each of the four lines in the titv.data file, and based on the numbers in the column labeled p_dist, compute the JC-corrected distance. Enter the results in the column labeled &amp;quot;JC&amp;quot; in the titv.data file.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 11&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Compute K2P corrected distance:&#039;&#039;&#039;&lt;br /&gt;
As was the case for the JC model, we can also compute estimated real distances under the K2P model. This can be done using the following equation:&lt;br /&gt;
:&amp;lt;math&amp;gt;d = -\frac{1}{2} \ln(1 - 2P - Q) - \frac{1}{4}\ln(1 - 2Q)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Using the numbers in columns P and Q, you should now use this equation to compute the K2P-corrected distance estimates. Enter the results in the column labeled K2P in the file. Make sure to save the file after all results have been entered.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 12&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Plot distances&#039;&#039;&#039;&lt;br /&gt;
In RStudio enter:&lt;br /&gt;
 df = read_table2(&amp;quot;titv.data&amp;quot;, col_names = FALSE, skip=1)&lt;br /&gt;
 &lt;br /&gt;
 df = df %&amp;gt;% rename(organisms=X1, &lt;br /&gt;
                    div_time=X2, &lt;br /&gt;
                    pdist=X3, &lt;br /&gt;
                    transitions=X4, &lt;br /&gt;
                    transversions=X5,&lt;br /&gt;
                    JC = X6, &lt;br /&gt;
                    K2P = X7)&lt;br /&gt;
 &lt;br /&gt;
 dflong = df %&amp;gt;% pivot_longer(cols=-c(organisms, div_time))&lt;br /&gt;
 &lt;br /&gt;
 ggplot(dflong, aes(x=div_time, y=value, col=name)) + &lt;br /&gt;
     geom_line() + &lt;br /&gt;
     labs(x=&amp;quot;Time since divergence (MY)&amp;quot;, &lt;br /&gt;
          y=&amp;quot;Genetic distance (substitutions/site)&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
: We have here plotted the total difference, the observed transitional and transversional difference, as well as the JC- and K2P-corrected distances as a function of estimated divergence times.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Do the two different correction schemes result in the same estimates of the real distance?&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Distance_Matrix_Methods&amp;diff=28</id>
		<title>Distance Matrix Methods</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Distance_Matrix_Methods&amp;diff=28"/>
		<updated>2024-03-19T13:34:02Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Getting started */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This exercise is part of the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
&lt;br /&gt;
: &#039;&#039;&#039;Note:&#039;&#039;&#039; If you didn&#039;t already do this during the video lecture: Start by doing the [https://teaching.healthtech.dtu.dk/material/22115/Distance_handout.pdf handout exercise for distance matrix methods].&lt;br /&gt;
&lt;br /&gt;
: In this exercise we will reconstruct phylogenetic trees using a variety of distance-based methods. Specifically, we will explore two different optimality criteria (least squares and minimum evolution), and one clustering method (neighbor joining).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Copy files for today&#039;s exercise:&#039;&#039;&#039;&lt;br /&gt;
Make sure you&#039;re still in today&#039;s working directory (condist) and that you already have the hcv.nexus file there. Now, copy the following file to the dir also:&lt;br /&gt;
 cp ../data/simple.nexus simple.nexus&lt;br /&gt;
 ls -l&lt;br /&gt;
: simple.nexus is an artificial data set that I have constructed. It is identical to the one you analyzed by hand in the handout exercise. We will use it to convince ourselves that PAUP gets the same result as you.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Analysis of the Simple Data Set ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 1&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start PAUP* and load the simple data set:&#039;&#039;&#039;&lt;br /&gt;
 paup simple.nexus&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Select distance-based tree-reconstruction:&#039;&#039;&#039;&lt;br /&gt;
 set criterion=distance&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Select uncorrected distances under the un-weighted least squares criterion:&#039;&#039;&#039;&lt;br /&gt;
 dset distance=p objective=lsfit power=0&lt;br /&gt;
: The dset command is used to set various options for the distance-based methods. Option &amp;quot;distance=p&amp;quot; specifies the use of &amp;quot;uncorrected sequence distances&amp;quot;, i.e., we do not want to correct the observed distances for multiple substitutions. Note that distances are here reported as &amp;quot;substitutions per site&amp;quot;. This simply means that the number of differences has been divided by the length of the sequence. You can think of this distance as the fraction of sites that are different between two sequences.&lt;br /&gt;
&lt;br /&gt;
: The option &amp;quot;objective=lsfit&amp;quot; specifies that we want to reconstruct trees using the least squares optimality criterion. Recall that under least squares we are trying to find the tree that has the smallest possible deviation between the observed pairwise distances and the pairwise distances measured along the tree. (The distance between two taxa measured along the tree is called the &amp;quot;patristic&amp;quot; distance). The overall fit of the tree is found by (1) computing the difference between each observed distance and the corresponding patristic distance, (2) squaring this difference (this way we are sure to obtain a positive number, regardless of whether the observed or the patristic difference is larger), and (3) adding all the squared differences. The option &amp;quot;power=0&amp;quot; specifies that we do not want to weight the squared differences according to branch lengths when computing this fit.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect distance matrix&#039;&#039;&#039;&lt;br /&gt;
 showdist&lt;br /&gt;
: This command shows the distance matrix as evaluated under the current criteria.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Report the pairwise distances for all the pairs of (different) sequences: AB, AC, AD, BC, BD, CD&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 2&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find best tree using exhaustive search:&#039;&#039;&#039;&lt;br /&gt;
 alltrees&lt;br /&gt;
: This data set is sufficiently small that we can search through all possible trees.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; how many different, unrooted trees with 4 leafs is it possible to construct?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 3&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect best tree:&#039;&#039;&#039;&lt;br /&gt;
 outgroup A D&lt;br /&gt;
 set root=outgroup outroot=poly&lt;br /&gt;
 describetrees all/plot=phylogram brlens=yes label=yes&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; We now want to investigate whether the fitted branch lengths correspond to the observed pairwise distances. First, draw a sketch of the tree (note that in the PAUP output, this unrooted tree may look a bit weird - just draw it in the normal unrooted way you also used for the manual exercise, i.e., the tree should have a total of 5 branches). Second, label each branch with the branch length as listed in the table you just produced with describetrees. Finally, compute the patristic distance between each pair of species on the tree by adding up the branch lengths of branches lying on the path between the two taxa. Do the observed pairwise distances (from the distance matrix in the previous question) correspond to the patristic distances in this case?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 4&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Compare to the manually constructed tree:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; We now want to investigate whether the tree that PAUP has found here, corresponds to the one you constructed manually in the handout exercise. To do this you should convert all the fractional (&amp;quot;per-site&amp;quot;) distances reported by PAUP, to absolute distances. This is done simply by multiplying the fractional distance by the length of the alignment (15 positions, in this case). Is your tree and the PAUP tree identical (within rounding error)?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Analysis of HCV Data Set using Neighbor Joining ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 5&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Set up analysis for HCV data set&#039;&#039;&#039;&lt;br /&gt;
 execute hcv.nexus&lt;br /&gt;
 set criterion=distance&lt;br /&gt;
 dset distance=p objective=lsfit power=0&lt;br /&gt;
 outgroup  2_1_1 2_1_2 2_1_3 2_1_4 2_1_5 2_1_7 2_1_8 2_1_9 2_1_10&lt;br /&gt;
 set root=outgroup outroot=monophyl&lt;br /&gt;
: These commands will: load the file hcv.nexus (say yes when asked whether you want to reset the active datafile), select distance-based tree-reconstruction, select uncorrected distances, define patient 2 sequences as the outgroup, set outgroup rooting, and ensure outgroup is printed as monophyletic sister group to ingroup.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Construct a neighbor joining tree based on the HCV data:&#039;&#039;&#039;&lt;br /&gt;
 nj&lt;br /&gt;
: This will construct a neighbor joining tree using the active distance measure (currently set to uncorrected).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Print tree and table of branch lengths:&#039;&#039;&#039;&lt;br /&gt;
 describetrees 1/plot=phylogram brlens=yes&lt;br /&gt;
: The neighbor joining tree resembles the trees you previously constructed using parsimony. Importantly, you should see that the viral sequences from different patients form distinct clusters. Note that only a single tree is produced. This is characteristic of clustering methods, which work by following a deterministic algorithm for constructing a tree from distance data. Clustering algorithms such as neighbor joining do not have any measure of tree-goodness and therefore are not able to identify sets of equally good trees.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; The present neighbor joining tree was computed without correcting the observed distances for multiple substitutions. In the phylogram, identify the internal node that is ancestral to the patient 5 sequences (you will see that internal nodes are labeled with consecutive numbers), and also the internal node that is one level further down in the tree (i.e., ancestral to the ancestral node). You will note that the branch connecting these two nodes is relatively long. Locate the branch in the list of branch lengths, which is printed above the tree. What is the length of this branch?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 6&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Select correction of multiple substitutions using the Jukes and Cantor model:&#039;&#039;&#039;&lt;br /&gt;
 dset distance=jc&lt;br /&gt;
: This causes all observed distances to be corrected using a formula based on the Jukes and Cantor model of evolution. Recall that under the Jukes and Cantor model all base frequencies are assumed to be equal (at 0.25), and all base substitution rates are also assumed to be equal.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Construct a new neighbor joining tree using corrected distances:&#039;&#039;&#039;&lt;br /&gt;
 nj&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Print tree and table of branch lengths:&#039;&#039;&#039;&lt;br /&gt;
 describetrees 1/plot=phylogram brlens=yes&lt;br /&gt;
: In this tree all branch lengths have been corrected for (unobserved) multiple substitutions. That means they are slightly longer than the uncorrected distances, and this correction is more noticeable for longer branches.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Again locate the internal node that is ancestral to the patient 5 sequences and also the immediate ancestor of this node (the node labels are not necessarily the same as before). Now find the corresponding branch in the table and make a note of the length. Is the corrected branch length longer than the uncorrected one?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 7&#039;&#039;&#039; What is the ratio of the corrected to the uncorrected branch length? (Divide the corrected branch length by the uncorrected one)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 8&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Prepare table of model fit measures&#039;&#039;&#039;&lt;br /&gt;
: You are currently using neighbor joining to reconstruct the phylogenetic tree. Below you will also explore the use of least squares and minimum evolution methods. In order to compare the performance and characteristics of these methods we want to record some informative numbers. Construct a small table with two columns (labeled &amp;quot;SSE&amp;quot; and &amp;quot;tree length&amp;quot;), and three rows (labeled &amp;quot;NJ&amp;quot;, &amp;quot;least squares&amp;quot;, and &amp;quot;minimum evolution&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; At the end of the list of branch lengths (printed with the describetrees command), you will find the sum of all branch lengths. This is often called the &amp;quot;length&amp;quot; of the tree. What is the length of the tree? (also enter this number in your table, under the column &amp;quot;tree length&amp;quot; in the row &amp;quot;NJ&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 9&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Compute fit of NJ branch lengths to observed pairwise distances:&#039;&#039;&#039;&lt;br /&gt;
 dscores 1/objective=lsfit power=0&lt;br /&gt;
: The dscores command calculates the scores of trees in memory according to the distance criterion. In this case we are computing the fit between the observed pairwise distances and the branch lengths found by neighbor joining. The measure used is the sum of squared deviations mentioned above.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; What is the sum of squared errors? (it is indicated by &amp;quot;SS&amp;quot; which is an abbreviation for sum of squares). Enter the number in the table&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Analysis of HCV Data Set Using Least Squares ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 10&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Select JC corrected distances under the unweighted least squares criterion:&#039;&#039;&#039;&lt;br /&gt;
 dset distance=jc objective=lsfit power=0&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find the best tree using heuristic searching:&#039;&#039;&#039;&lt;br /&gt;
 hsearch start=nj swap=tbr&lt;br /&gt;
: As we have seen previously, the HCV data set is far too big for exhaustive searching, and we therefore have to resort to heuristic techniques when we are using a phylogenetic reconstruction method that is based on an optimality criterion. In this case the starting tree is constructed by neighbor joining, i.e., it should be identical to the tree we just inspected (in previous exercises we have used a random starting tree, but neighbour joining will get us closer to the optimum from the start). The heuristic search (which again uses re-arrangements of the &amp;quot;tree-bisection and reconnection&amp;quot; type) should result in a small set of equally good trees.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect trees:&#039;&#039;&#039;&lt;br /&gt;
 contree all/strict=no majrule=yes percent=50&lt;br /&gt;
: This constructs a consensus tree from the set of equally good best trees. Again you should see that the set of best trees have individual patients clustered separately. Note that while the Neighbor Joining tree also showed this feature, it did not indicate that there might be any uncertainty as to the details of the tree. However, by using a method that has an explicit measure of tree goodness (least squares in this case) you have now learned that there are several equally good reconstructions of the branch order within the individual patient clusters.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Compute fit of least squares branch lengths to observed pairwise distances:&#039;&#039;&#039;&lt;br /&gt;
 dscores 1/objective=lsfit power=0&lt;br /&gt;
: Again, we are computing the sum of squared deviations between observed and patristic pairwise distances. Arbitrarily we have chosen to only do this for tree number 1 (&amp;quot;dscores all&amp;quot; would have done it for all trees in memory), but recall that all trees in memory are equally good, so the results would have been identical to what you now get.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; What is the sum of squares? (Also enter the numbers in your table)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 11&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find total length of tree:&#039;&#039;&#039;&lt;br /&gt;
 describetrees 1/plot=no brlens=yes&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; What is the sum of all branch lengths when using the least squares criterion? (Remember to also enter the numbers in your table).&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 12&#039;&#039;&#039; Now, compare the results from this analysis with the number you obtained from the neighbor joining tree above. Has the fit improved? (Recall that for both sum of squares and tree length, smaller is better).&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Analysis of HCV Data Set Using Minimum Evolution ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 13&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Select JC corrected distances under the minimum evolution criterion:&#039;&#039;&#039;&lt;br /&gt;
 dset distance=jc objective=me&lt;br /&gt;
: We now want to explore a different optimality criterion for distance-based analysis. Under minimum evolution we take the shortest tree to be the best one. This is very similar to parsimony, but in this case we are using pairwise, JC-corrected distances as the basis for reconstructing the tree. ME proceeds by searching through a list of possible trees; for each tested topology the best set of branch lengths are found by the least squares method, but instead of finally choosing the tree with the best fit, we instead end up by choosing the shortest tree.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find the best tree using heuristic searching starting from a NJ tree:&#039;&#039;&#039;&lt;br /&gt;
 hsearch start=nj swap=tbr&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect trees:&#039;&#039;&#039;&lt;br /&gt;
 contree all/strict=no majrule=yes percent=50&lt;br /&gt;
: Again you should see that the set of best trees have individual patients clustered separately.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find total length of tree:&#039;&#039;&#039;&lt;br /&gt;
 describetrees 1/plot=no brlens=yes&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; At the end of the table listing branch lengths, you will again find the sum of all branch lengths. What is it?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 14&#039;&#039;&#039; Is the minimum evolution tree shorter than the other two trees?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 15&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Compute fit of minimum evolution branch lengths to observed pairwise distances:&#039;&#039;&#039;&lt;br /&gt;
 dscores 1/objective=lsfit power=0&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Again, we are computing the sum of squared deviations between observed and patristic pairwise distances. Note the result from this analysis in your table and compare it with the numbers you obtained from the neighbor joining and least squares analyses above. How is the fit of the ME tree compared to those two judged by the sum of squares?&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Distance_Matrix_Methods&amp;diff=27</id>
		<title>Distance Matrix Methods</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Distance_Matrix_Methods&amp;diff=27"/>
		<updated>2024-03-19T13:31:16Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;This exercise is part of the course  Computational Molecular Evolution (22115).   == Getting started ==  : &amp;#039;&amp;#039;&amp;#039;Note:&amp;#039;&amp;#039;&amp;#039; If you didn&amp;#039;t already do this during the video lecture: Start by doing the [https://teaching.healthtech.dtu.dk/22115/images/3/3f/Distance_handout.pdf handout exercise for distance matrix methods].  : In this exercise we will reconstruct phylogenetic trees using a variety of distance-based methods. Specificall...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This exercise is part of the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
&lt;br /&gt;
: &#039;&#039;&#039;Note:&#039;&#039;&#039; If you didn&#039;t already do this during the video lecture: Start by doing the [https://teaching.healthtech.dtu.dk/22115/images/3/3f/Distance_handout.pdf handout exercise for distance matrix methods].&lt;br /&gt;
&lt;br /&gt;
: In this exercise we will reconstruct phylogenetic trees using a variety of distance-based methods. Specifically, we will explore two different optimality criteria (least squares and minimum evolution), and one clustering method (neighbor joining).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Copy files for today&#039;s exercise:&#039;&#039;&#039;&lt;br /&gt;
Make sure you&#039;re still in today&#039;s working directory (condist) and that you already have the hcv.nexus file there. Now, copy the following file to the dir also:&lt;br /&gt;
 cp ../data/simple.nexus simple.nexus&lt;br /&gt;
 ls -l&lt;br /&gt;
: simple.nexus is an artificial data set that I have constructed. It is identical to the one you analyzed by hand in the handout exercise. We will use it to convince ourselves that PAUP gets the same result as you.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Analysis of the Simple Data Set ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 1&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start PAUP* and load the simple data set:&#039;&#039;&#039;&lt;br /&gt;
 paup simple.nexus&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Select distance-based tree-reconstruction:&#039;&#039;&#039;&lt;br /&gt;
 set criterion=distance&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Select uncorrected distances under the un-weighted least squares criterion:&#039;&#039;&#039;&lt;br /&gt;
 dset distance=p objective=lsfit power=0&lt;br /&gt;
: The dset command is used to set various options for the distance-based methods. Option &amp;quot;distance=p&amp;quot; specifies the use of &amp;quot;uncorrected sequence distances&amp;quot;, i.e., we do not want to correct the observed distances for multiple substitutions. Note that distances are here reported as &amp;quot;substitutions per site&amp;quot;. This simply means that the number of differences has been divided by the length of the sequence. You can think of this distance as the fraction of sites that are different between two sequences.&lt;br /&gt;
&lt;br /&gt;
: The option &amp;quot;objective=lsfit&amp;quot; specifies that we want to reconstruct trees using the least squares optimality criterion. Recall that under least squares we are trying to find the tree that has the smallest possible deviation between the observed pairwise distances and the pairwise distances measured along the tree. (The distance between two taxa measured along the tree is called the &amp;quot;patristic&amp;quot; distance). The overall fit of the tree is found by (1) computing the difference between each observed distance and the corresponding patristic distance, (2) squaring this difference (this way we are sure to obtain a positive number, regardless of whether the observed or the patristic difference is larger), and (3) adding all the squared differences. The option &amp;quot;power=0&amp;quot; specifies that we do not want to weight the squared differences according to branch lengths when computing this fit.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect distance matrix&#039;&#039;&#039;&lt;br /&gt;
 showdist&lt;br /&gt;
: This command shows the distance matrix as evaluated under the current criteria.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Report the pairwise distances for all the pairs of (different) sequences: AB, AC, AD, BC, BD, CD&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 2&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find best tree using exhaustive search:&#039;&#039;&#039;&lt;br /&gt;
 alltrees&lt;br /&gt;
: This data set is sufficiently small that we can search through all possible trees.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; how many different, unrooted trees with 4 leafs is it possible to construct?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 3&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect best tree:&#039;&#039;&#039;&lt;br /&gt;
 outgroup A D&lt;br /&gt;
 set root=outgroup outroot=poly&lt;br /&gt;
 describetrees all/plot=phylogram brlens=yes label=yes&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; We now want to investigate whether the fitted branch lengths correspond to the observed pairwise distances. First, draw a sketch of the tree (note that in the PAUP output, this unrooted tree may look a bit weird - just draw it in the normal unrooted way you also used for the manual exercise, i.e., the tree should have a total of 5 branches). Second, label each branch with the branch length as listed in the table you just produced with describetrees. Finally, compute the patristic distance between each pair of species on the tree by adding up the branch lengths of branches lying on the path between the two taxa. Do the observed pairwise distances (from the distance matrix in the previous question) correspond to the patristic distances in this case?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 4&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Compare to the manually constructed tree:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; We now want to investigate whether the tree that PAUP has found here, corresponds to the one you constructed manually in the handout exercise. To do this you should convert all the fractional (&amp;quot;per-site&amp;quot;) distances reported by PAUP, to absolute distances. This is done simply by multiplying the fractional distance by the length of the alignment (15 positions, in this case). Is your tree and the PAUP tree identical (within rounding error)?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Analysis of HCV Data Set using Neighbor Joining ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 5&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Set up analysis for HCV data set&#039;&#039;&#039;&lt;br /&gt;
 execute hcv.nexus&lt;br /&gt;
 set criterion=distance&lt;br /&gt;
 dset distance=p objective=lsfit power=0&lt;br /&gt;
 outgroup  2_1_1 2_1_2 2_1_3 2_1_4 2_1_5 2_1_7 2_1_8 2_1_9 2_1_10&lt;br /&gt;
 set root=outgroup outroot=monophyl&lt;br /&gt;
: These commands will: load the file hcv.nexus (say yes when asked whether you want to reset the active datafile), select distance-based tree-reconstruction, select uncorrected distances, define patient 2 sequences as the outgroup, set outgroup rooting, and ensure outgroup is printed as monophyletic sister group to ingroup.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Construct a neighbor joining tree based on the HCV data:&#039;&#039;&#039;&lt;br /&gt;
 nj&lt;br /&gt;
: This will construct a neighbor joining tree using the active distance measure (currently set to uncorrected).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Print tree and table of branch lengths:&#039;&#039;&#039;&lt;br /&gt;
 describetrees 1/plot=phylogram brlens=yes&lt;br /&gt;
: The neighbor joining tree resembles the trees you previously constructed using parsimony. Importantly, you should see that the viral sequences from different patients form distinct clusters. Note that only a single tree is produced. This is characteristic of clustering methods, which work by following a deterministic algorithm for constructing a tree from distance data. Clustering algorithms such as neighbor joining do not have any measure of tree-goodness and therefore are not able to identify sets of equally good trees.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; The present neighbor joining tree was computed without correcting the observed distances for multiple substitutions. In the phylogram, identify the internal node that is ancestral to the patient 5 sequences (you will see that internal nodes are labeled with consecutive numbers), and also the internal node that is one level further down in the tree (i.e., ancestral to the ancestral node). You will note that the branch connecting these two nodes is relatively long. Locate the branch in the list of branch lengths, which is printed above the tree. What is the length of this branch?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 6&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Select correction of multiple substitutions using the Jukes and Cantor model:&#039;&#039;&#039;&lt;br /&gt;
 dset distance=jc&lt;br /&gt;
: This causes all observed distances to be corrected using a formula based on the Jukes and Cantor model of evolution. Recall that under the Jukes and Cantor model all base frequencies are assumed to be equal (at 0.25), and all base substitution rates are also assumed to be equal.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Construct a new neighbor joining tree using corrected distances:&#039;&#039;&#039;&lt;br /&gt;
 nj&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Print tree and table of branch lengths:&#039;&#039;&#039;&lt;br /&gt;
 describetrees 1/plot=phylogram brlens=yes&lt;br /&gt;
: In this tree all branch lengths have been corrected for (unobserved) multiple substitutions. That means they are slightly longer than the uncorrected distances, and this correction is more noticeable for longer branches.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Again locate the internal node that is ancestral to the patient 5 sequences and also the immediate ancestor of this node (the node labels are not necessarily the same as before). Now find the corresponding branch in the table and make a note of the length. Is the corrected branch length longer than the uncorrected one?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 7&#039;&#039;&#039; What is the ratio of the corrected to the uncorrected branch length? (Divide the corrected branch length by the uncorrected one)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 8&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Prepare table of model fit measures&#039;&#039;&#039;&lt;br /&gt;
: You are currently using neighbor joining to reconstruct the phylogenetic tree. Below you will also explore the use of least squares and minimum evolution methods. In order to compare the performance and characteristics of these methods we want to record some informative numbers. Construct a small table with two columns (labeled &amp;quot;SSE&amp;quot; and &amp;quot;tree length&amp;quot;), and three rows (labeled &amp;quot;NJ&amp;quot;, &amp;quot;least squares&amp;quot;, and &amp;quot;minimum evolution&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; At the end of the list of branch lengths (printed with the describetrees command), you will find the sum of all branch lengths. This is often called the &amp;quot;length&amp;quot; of the tree. What is the length of the tree? (also enter this number in your table, under the column &amp;quot;tree length&amp;quot; in the row &amp;quot;NJ&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 9&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Compute fit of NJ branch lengths to observed pairwise distances:&#039;&#039;&#039;&lt;br /&gt;
 dscores 1/objective=lsfit power=0&lt;br /&gt;
: The dscores command calculates the scores of trees in memory according to the distance criterion. In this case we are computing the fit between the observed pairwise distances and the branch lengths found by neighbor joining. The measure used is the sum of squared deviations mentioned above.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; What is the sum of squared errors? (it is indicated by &amp;quot;SS&amp;quot; which is an abbreviation for sum of squares). Enter the number in the table&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Analysis of HCV Data Set Using Least Squares ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 10&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Select JC corrected distances under the unweighted least squares criterion:&#039;&#039;&#039;&lt;br /&gt;
 dset distance=jc objective=lsfit power=0&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find the best tree using heuristic searching:&#039;&#039;&#039;&lt;br /&gt;
 hsearch start=nj swap=tbr&lt;br /&gt;
: As we have seen previously, the HCV data set is far too big for exhaustive searching, and we therefore have to resort to heuristic techniques when we are using a phylogenetic reconstruction method that is based on an optimality criterion. In this case the starting tree is constructed by neighbor joining, i.e., it should be identical to the tree we just inspected (in previous exercises we have used a random starting tree, but neighbour joining will get us closer to the optimum from the start). The heuristic search (which again uses re-arrangements of the &amp;quot;tree-bisection and reconnection&amp;quot; type) should result in a small set of equally good trees.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect trees:&#039;&#039;&#039;&lt;br /&gt;
 contree all/strict=no majrule=yes percent=50&lt;br /&gt;
: This constructs a consensus tree from the set of equally good best trees. Again you should see that the set of best trees have individual patients clustered separately. Note that while the Neighbor Joining tree also showed this feature, it did not indicate that there might be any uncertainty as to the details of the tree. However, by using a method that has an explicit measure of tree goodness (least squares in this case) you have now learned that there are several equally good reconstructions of the branch order within the individual patient clusters.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Compute fit of least squares branch lengths to observed pairwise distances:&#039;&#039;&#039;&lt;br /&gt;
 dscores 1/objective=lsfit power=0&lt;br /&gt;
: Again, we are computing the sum of squared deviations between observed and patristic pairwise distances. Arbitrarily we have chosen to only do this for tree number 1 (&amp;quot;dscores all&amp;quot; would have done it for all trees in memory), but recall that all trees in memory are equally good, so the results would have been identical to what you now get.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; What is the sum of squares? (Also enter the numbers in your table)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 11&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find total length of tree:&#039;&#039;&#039;&lt;br /&gt;
 describetrees 1/plot=no brlens=yes&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; What is the sum of all branch lengths when using the least squares criterion? (Remember to also enter the numbers in your table).&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 12&#039;&#039;&#039; Now, compare the results from this analysis with the number you obtained from the neighbor joining tree above. Has the fit improved? (Recall that for both sum of squares and tree length, smaller is better).&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Analysis of HCV Data Set Using Minimum Evolution ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 13&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Select JC corrected distances under the minimum evolution criterion:&#039;&#039;&#039;&lt;br /&gt;
 dset distance=jc objective=me&lt;br /&gt;
: We now want to explore a different optimality criterion for distance-based analysis. Under minimum evolution we take the shortest tree to be the best one. This is very similar to parsimony, but in this case we are using pairwise, JC-corrected distances as the basis for reconstructing the tree. ME proceeds by searching through a list of possible trees; for each tested topology the best set of branch lengths are found by the least squares method, but instead of finally choosing the tree with the best fit, we instead end up by choosing the shortest tree.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find the best tree using heuristic searching starting from a NJ tree:&#039;&#039;&#039;&lt;br /&gt;
 hsearch start=nj swap=tbr&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect trees:&#039;&#039;&#039;&lt;br /&gt;
 contree all/strict=no majrule=yes percent=50&lt;br /&gt;
: Again you should see that the set of best trees have individual patients clustered separately.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find total length of tree:&#039;&#039;&#039;&lt;br /&gt;
 describetrees 1/plot=no brlens=yes&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; At the end of the table listing branch lengths, you will again find the sum of all branch lengths. What is it?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 14&#039;&#039;&#039; Is the minimum evolution tree shorter than the other two trees?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 15&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Compute fit of minimum evolution branch lengths to observed pairwise distances:&#039;&#039;&#039;&lt;br /&gt;
 dscores 1/objective=lsfit power=0&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Again, we are computing the sum of squared deviations between observed and patristic pairwise distances. Note the result from this analysis in your table and compare it with the numbers you obtained from the neighbor joining and least squares analyses above. How is the fit of the ME tree compared to those two judged by the sum of squares?&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=File:Figtreenext.png&amp;diff=26</id>
		<title>File:Figtreenext.png</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=File:Figtreenext.png&amp;diff=26"/>
		<updated>2024-03-19T13:27:39Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Consensus_Trees&amp;diff=25</id>
		<title>Consensus Trees</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Consensus_Trees&amp;diff=25"/>
		<updated>2024-03-19T13:26:35Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;This exercise is part of the course  Computational Molecular Evolution (22115).  == Getting started ==  &amp;#039;&amp;#039;&amp;#039;1: Start Terminal window&amp;#039;&amp;#039;&amp;#039;  &amp;#039;&amp;#039;&amp;#039;2: Construct working directory:&amp;#039;&amp;#039;&amp;#039; : In the command below: Instead of &amp;lt;code&amp;gt;/path/to/molevol&amp;lt;/code&amp;gt; enter the path to the directory where you have placed your course files (for instance &amp;lt;code&amp;gt;cd /Users/bob/Documents/molevol&amp;lt;/code&amp;gt;, or &amp;lt;code&amp;gt;cd /home/student/molevol&amp;lt;/code&amp;gt;).   cd /path/to/m...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This exercise is part of the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]].&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;1: Start Terminal window&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2: Construct working directory:&#039;&#039;&#039;&lt;br /&gt;
: In the command below: Instead of &amp;lt;code&amp;gt;/path/to/molevol&amp;lt;/code&amp;gt; enter the path to the directory where you have placed your course files (for instance &amp;lt;code&amp;gt;cd /Users/bob/Documents/molevol&amp;lt;/code&amp;gt;, or &amp;lt;code&amp;gt;cd /home/student/molevol&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
 cd /path/to/molevol&lt;br /&gt;
 mkdir condist&lt;br /&gt;
 cd condist&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3: Copy data file:&#039;&#039;&#039;&lt;br /&gt;
: &#039;&#039;&#039;Note&#039;&#039;&#039;: in the following command you are copying a file that you were supposed to create as part of the week 2 exercise. Specifically, this is the alignment of HCV sequences in Nexus format. If you haven&#039;t finished that step, go back and do so now. Also note that your file name may be different than hcv.nexus - if so, then substitute your own file name in the following commands&lt;br /&gt;
 cp ../parsimony/hcv.nexus hcv.nexus&lt;br /&gt;
 nedit hcv.nexus &amp;amp;&lt;br /&gt;
: This file contains an alignment (in nexus format) of 41 Hepatitis C virus (HCV) sequences isolated from 5 different patients. Sequences are named in the following way: &amp;lt;code&amp;gt;Patient_Time_Clone&amp;lt;/code&amp;gt;. For instance, the sequence labeled 1_1_5 was isolated from patient number 1 at time point 1 and is clone number 5 from that patient and that time point.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Summarising sets of equally parsimonious trees by their consensus tree ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 1&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start the paup program and load the data file:&#039;&#039;&#039;&lt;br /&gt;
 paup hcv.nexus&lt;br /&gt;
: This command opens the PAUP* program and automatically executes the nexus file at the same time&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Define the outgroup:&#039;&#039;&#039;&lt;br /&gt;
 outgroup  2_1_1 2_1_2 2_1_3 2_1_4 2_1_5 2_1_7 2_1_8 2_1_9 2_1_10&lt;br /&gt;
 set root=outgroup outroot=monophyl&lt;br /&gt;
: This puts all nine sequences from patient 2 in the outgroup, and ensures that the outgroup is printed as a monophyletic sister group to the ingroup. This will help make the tree-plots clearer.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Enable PAUP* to store an unlimited number of trees:&#039;&#039;&#039;&lt;br /&gt;
 set increase=auto&lt;br /&gt;
: Normally PAUP* will only store up to &amp;quot;maxtrees&amp;quot; trees in memory. This command allows maxtrees to be increased automatically (without prompting for user confirmation) if the need arises during the heuristic search&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Perform a heuristic search using TBR:&#039;&#039;&#039;&lt;br /&gt;
 hsearch start=stepwise addseq=random nreps=20 rseed=98367 swap=TBR&lt;br /&gt;
: This command starts a heuristic search with tree-rearrangements of the TBR type, where the initial tree is constructed using sequential addition where sequences are added in random order, and 20 different starting trees are tried.&lt;br /&gt;
&lt;br /&gt;
: After a brief processing time you will be back where you ended last Wednesday. Among a total of approximately 10^60 possible trees, PAUP* has found about 240 equally parsimonious best trees. This may sound like a depressingly large number of alternative reconstructions but as you will now see, these trees do in fact have a lot in common.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What is the length of the best trees?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 2&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Convert trees to rooted form:&#039;&#039;&#039;&lt;br /&gt;
 roottrees&lt;br /&gt;
: Above we have specified an outgroup and requested that trees be plotted with a root determined by this outgroup. However, the trees that we found by heuristic searching are still unrooted, and we need to explicitly specify that we want them to be rooted. Placement of the root is of course done on the basis of the outgroup.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect resulting trees individually:&#039;&#039;&#039;&lt;br /&gt;
 describetrees 37/plot=cladogram label=no&lt;br /&gt;
: This shows you one randomly picked tree (tree number 37) among the &amp;gt;200 best trees that were found by the heuristic search. (The option label=no turns off labeling of the internal nodes in the tree). Make sure that your Terminal window is wide enough that the tree plot fits. Notice how the viral sequences from each individual patient group together. This shows that while there is considerable diversity in the viral population within any single patient, those viruses are nevertheless more closely related to each other than to viruses from other patients. This is of course a result of the viruses in one patient all having descended from the virus that originally infected that patient. Plotting the tree with branch lengths may make this clustering more apparent:&lt;br /&gt;
 describetrees 37/plot=phylogram label=no&lt;br /&gt;
: Remember: you also have the option of saving one or more trees to file and then viewing the tree using FigTree. For instance, you save tree number 37 by the following command:&lt;br /&gt;
 savetrees file=hcvtree.nexus brlens=yes from=37 to=37&lt;br /&gt;
: You can also save a range of trees of course.&lt;br /&gt;
&lt;br /&gt;
: To see whether this phenomenon is limited to the tree we selected first, save a range of 10 trees to a file and then inspect them in figtree. Notice that when more than one tree is opened in FigTree you can use the small arrows labeled &amp;quot;Prev/Next&amp;quot; to move between trees:&lt;br /&gt;
&lt;br /&gt;
[[File:Figtreenext.png|90%]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Construct a consensus tree :&#039;&#039;&#039;&lt;br /&gt;
: You should now be convinced that the more than 200 equally good trees do in fact have quite a lot in common. Importantly it seems that all trees have viruses from individual patients grouped separately (forming five monophyletic groups). In order to investigate this question we will now construct a majority rule consensus tree summarizing the branching patterns in all the &amp;gt;200 trees:&lt;br /&gt;
 contree all /strict=no majrule=yes percent=50&lt;br /&gt;
: This constructs a consensus tree showing monophyletic groups occurring in more than 50% of all trees. Scroll back to see the tree. At each internal node is an indication of how often the corresponding group (meaning all taxa descending from that internal node) was found in the set of all trees. (Numbers are percentages). The option percent=50 specifies that we want to see only groups occurring at least 50% of the time (i.e., we are requesting a &amp;quot;majority rule consensus&amp;quot;). You can increase this value (not lower it) if you want to set a different cutoff.&lt;br /&gt;
&lt;br /&gt;
: You will note that there are some sub-trees where the branching order is now unresolved, meaning that three or more taxa all split out from the same internal node. These multifurcations show that while more than 50% of the individual trees had those taxa together as a group (the precise number is indicated at the internal node), different trees nevertheless disagreed on the exact branching order within that group.&lt;br /&gt;
&lt;br /&gt;
: As you can see, consensus trees are a handy way of summarizing the evidence shared in a set of trees, and they are therefore useful when a search identifies several good reconstructed phylogenies.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Do the sequences for patient 1 form a monophyletic group in the consensus tree?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 3&#039;&#039;&#039; In what fraction of the original (input) trees did the patient 1 sequences form a monophyletic group? (this is the percentage written at the internal node at the basis of that patient&#039;s group of sequences)?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 4&#039;&#039;&#039; Do the sequences for patient 5 form a monophyletic group in the consensus tree?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 5 &#039;&#039;&#039; In what fraction of the original (input) trees did the patient 5 sequences form a monophyletic group?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 6&#039;&#039;&#039; Do the sequences for patient 7 form a monophyletic group in the consensus tree?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 7&#039;&#039;&#039; In what fraction of the original (input) trees did the patient 7 sequences form a monophyletic group?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Quit PAUP :&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
 q&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Phylogenetic_Analysis_using_Parsimony&amp;diff=24</id>
		<title>Phylogenetic Analysis using Parsimony</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Phylogenetic_Analysis_using_Parsimony&amp;diff=24"/>
		<updated>2024-03-19T13:25:14Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;This exercise is part of the course  Computational Molecular Evolution (22115).   == Overview ==  In this exercise you are going to examine three different data sets using the PAUP* program. All analyses will be performed under the parsimony criterion. Data set 1 consists of genomic nucleotide sequences from 9 primate species. This set is sufficiently small that an exhaustive search of all possible trees can be performed. Dat...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This exercise is part of the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
&lt;br /&gt;
In this exercise you are going to examine three different data sets using the PAUP* program. All analyses will be performed under the parsimony criterion. Data set 1 consists of genomic nucleotide sequences from 9 primate species. This set is sufficiently small that an exhaustive search of all possible trees can be performed. Data set 2 consists of mitochondrial nucleotide sequences from 12 primates. This data set is so large that it would take too long to run through all possible trees (at least for the purpose of this exercise), but it is still small enough that branch and bound can be used to perform an exhaustive search. The third and last data set consists of Hepatitis C virus sequences isolated from different patients. This set is much too big for exhaustive searching and we will have to employ heuristic methods to analyze it.&lt;br /&gt;
&lt;br /&gt;
In addition to exploring aspects of parsimonious phylogenetic reconstruction, an important goal of this exercise is to introduce you to the PAUP* interface, and to the different types of manipulations and analyses that can be performed within the program. Later in the course you will use PAUP* for distance-based and maximum likelihood-based phylogenetic reconstruction. Several other programs (e.g., MacClade and MrBayes) use command-line interfaces that are very similar to the one used by PAUP*.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Getting started ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;1: Start Terminal window&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
: Make sure to maximize the window: the analyses we will perform give lots of output to the screen, so having a nice and large shell window makes it easier to keep track of what happens.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2: Construct working directory, copy files:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
In the command below: Instead of &amp;lt;code&amp;gt;/path/to/molevol&amp;lt;/code&amp;gt; enter the path to the directory where you have placed your course files (for instance &amp;lt;code&amp;gt;cd /Users/bob/Documents/molevol&amp;lt;/code&amp;gt;, or &amp;lt;code&amp;gt;cd /home/student/molevol&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
 cd /path/to/molevol&lt;br /&gt;
 mkdir parsimony&lt;br /&gt;
 cd parsimony&lt;br /&gt;
 cp ../data/mhc.fasta mhc.fasta&lt;br /&gt;
 cp ../data/primate_mtDNA_interleaved.fasta primate_mtDNA_interleaved.fasta&lt;br /&gt;
 cp ../data/hcv.fasta hcv.fasta&lt;br /&gt;
 ls -l&lt;br /&gt;
&lt;br /&gt;
: The commands above first construct a directory called &amp;quot;parsimony&amp;quot;, then changes to that directory, and finally copies three files from your data directory to the current folder (parsimony).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3: Have a look at the data files:&#039;&#039;&#039;&lt;br /&gt;
 nedit mhc.fasta&lt;br /&gt;
: This file is in the so-called fasta-format, and contains unaligned DNA sequences. Specifically, the sequences are major histo-compatibility complex (MHC) class I genes from nine different primate species. MHC class I genes encode proteins that are involved in the immune response. Now, close the nedit window and have a look at the next data file:&lt;br /&gt;
 nedit primate_mtDNA_interleaved.fasta&lt;br /&gt;
: This file contains mitochondrial DNA sequences from 12 different primate species. Close the nedit window, and have a look at the final data file:&lt;br /&gt;
 nedit hcv.fasta&lt;br /&gt;
: This file contains Hepatitis C virus (HCV) sequences isolated from 4 different patients. The sequenced region corresponds to the end of the E1 gene and the beginning of the E2 gene, surrounding the so-called hyper-variable region 1 (HVR1). When you&#039;ve had a look close the nedit window so it doesn&#039;t clutter your screen.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Phylogenetic Analysis of MHC Class I sequences ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 1&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Make multiple alignment using mafft&#039;&#039;&#039;&lt;br /&gt;
 mafft --auto mhc.fasta &amp;gt; mhc_aligned.fasta&lt;br /&gt;
: mafft is a program for making multiple alignments, that works well and can handle large data sets. It is possible to use different alignment algorithms with mafft, depending on whether the focus is on very precise alignment of a specific type of sequences or rapid alignment of large data sets for instance. ([https://mafft.cbrc.jp/alignment/software/algorithms/algorithms.html See here for details]). The option --auto makes the program choose automatically among these algorithms based on the input data.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect the alignment in graphical viewer:&#039;&#039;&#039;&lt;br /&gt;
 aliview mhc_aligned.fasta&lt;br /&gt;
: aliview is a program for graphically viewing multiple alignments. Residues can be coloured according to different principles (e.g., one color per nucleotide, or colouring according to the degree of conservation of a column) making it simpler to get an overview of the quality of the alignment. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;There is a gap in the &amp;quot;yellow_baboon&amp;quot; sequence. What is the length of this gap? (number of nucleotides)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question: 2&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Why does this length make evolutionary sense (as opposed to, say, if the gap was one nucleotide shorter or longer)?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 3&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Convert alignment format to NEXUS:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
The program we will use next does not understand FASTA format, so we will first have to convert the alignment to a format it does understand, namely the so-called NEXUS format. We will do that using the command line program [https://github.com/agormp/seqconverter seqconverter] (which is one that I have written, and that does various sequence manipulations):&lt;br /&gt;
 seqconverter --informat fasta --outformat nexus -i mhc_aligned.fasta &amp;gt; mhc.nexus&lt;br /&gt;
&lt;br /&gt;
seqconverter writes its output to the terminal (&amp;quot;stdout&amp;quot;) so to save the output to a file we need to use [https://www.tutorialspoint.com/unix/unix-io-redirections.htm redirection of the output]. &lt;br /&gt;
&lt;br /&gt;
(There are a number of other programs that can perform sequence format conversion, including the [https://www.ebi.ac.uk/Tools/sfc/emboss_seqret/ EMBOSS Seqret] tool at the homepage of the European Bioinformatics Institute). &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Start the PAUP* program:&#039;&#039;&#039;&lt;br /&gt;
 paup&lt;br /&gt;
: You have now started the PAUP program. Notice that instead of the normal command prompt you now have the PAUP command prompt (&amp;lt;code&amp;gt;paup&amp;gt; &amp;lt;/code&amp;gt;). This indicates that you are in the PAUP program, which is ready to receive commands. &lt;br /&gt;
&#039;&#039;&#039;Load the aligned data&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
At the PAUP prompt, type the following:&lt;br /&gt;
 execute mhc.nexus&lt;br /&gt;
This loads the data into PAUP.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Check the output from the PAUP program: How long (how many columns) is the alignment you just loaded?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 4&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Define the outgroup:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
At the PAUP prompt, type the following command to set the outgroup for the tree:&lt;br /&gt;
 outgroup olive_baboon macaque yellow_baboon&lt;br /&gt;
: This defines an outgroup consisting of the three old world monkeys in the data set. The names are those that appear in the mhc.nexus file.&lt;br /&gt;
&lt;br /&gt;
: The outgroup will be used to place the root of the tree. The rationale is as follows: our data set consists of sequences from man, from a number of great apes, and from a number of old world monkeys. We know from other evidence that the lineage leading to monkeys branched off before any of the remaining organisms diverged from each other. The root of the tree connecting the organisms investigated here, must therefore be located between the monkeys (the &amp;quot;outgroup&amp;quot;) and the rest (the &amp;quot;ingroup&amp;quot;)). This way of finding a root is called &amp;quot;outgroup rooting&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Activate outgroup rooting and select how tree will be printed:&#039;&#039;&#039;&lt;br /&gt;
 set root=outgroup outroot=monophyl&lt;br /&gt;
: This makes PAUP* use the outgroup we defined above for the purpose of rooting the tree. The &amp;quot;outroot=monophyl&amp;quot; command makes PAUP construct a tree where the outgroup is a monophyletic sister group to the ingroup. (Outroot could also have been set to &amp;quot;polytomy&amp;quot; or &amp;quot;paraphyl&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find the most parsimonious tree by examining all possible trees:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Make sure the terminal window is maximally wide and then run the following command:&lt;br /&gt;
 alltrees fd=barChart&lt;br /&gt;
: This makes PAUP* find the best trees by exhaustively searching through all possible trees (the number of unrooted trees with 9 taxons is 135,135 - yes that is an actual number!). By default PAUP* uses the parsimony criterion for constructing trees. For each possible tree PAUP* finds the length (i.e., the number of mutational events required to explain the data set), and upon finishing this, the best tree is the one with the smallest total length. If there are several trees with the same length, then these are all kept since they are equally good. At the end of the run, PAUP* outputs a bar chart (histogram) giving the frequency distribution of all tree lengths (the histogram is turned on its side here). Above the histogram is a textual summary of what PAUP did and the result. Look through all of this to answer the next questions.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What is the length of the shortest trees?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 5&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;How many trees with this score was found?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 6&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;How far are the best trees from the second best ones?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 7&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Examine the best trees&#039;&#039;&#039;&lt;br /&gt;
 pscores&lt;br /&gt;
: This prints the length of the trees that are currently in memory. (Actually this command not only prints but computes the lengths, so it can be used to evaluate any tree that has been loaded into memory - also trees built by other methods or other programs).&lt;br /&gt;
 describetrees all/plot=cladogram labelnode=no&lt;br /&gt;
: This will print descriptions and plots of both trees (The command &amp;quot;describetrees 1&amp;quot; would have printed a description of only the first tree). A cladogram is a type of tree where branch lengths are ignored and only branching order is indicated. The option &amp;quot;labelnode=no&amp;quot; turns off labelling of internal nodes (otherwise they would have been labeled with consecutive numbers). You can scroll back to compare the two trees. You may notice that the disagreement between the two trees concerns whether gibbon or orangutan is closer to the root. Based on the information in this data set we can not distinguish between these two possibilities.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Note where the human sequence is located: Who are our closest relatives according to this tree?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 8&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Examine tree with branch lengths:&#039;&#039;&#039;&lt;br /&gt;
 describetrees all/plot=phylogram labelnode=no&lt;br /&gt;
: A phylogram is a tree where branches are drawn with lengths proportional to the number of mutational events that has happened on them. (Note that tree-terminology is not entirely consistent, and there are several other names for this type of plot). Branch lengths are based on the reconstructed location of mutational events (and therefore also on the reconstructed ancestral sequences). Under the parsimony criterion: If a mutational event could have occurred on any one of a number of branches (in the sense that all these reconstructions give the same tree length), then each of the branches are assigned a fraction of a mutation. For instance, if a mutational event could have been placed on either of two branches, then both of them will be counted as having had 0.5 mutational events.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What is the longest terminal branch on the tree? (Terminal branch = branch leading to a leaf).&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 9&#039;&#039;&#039;&lt;br /&gt;
 showtrees all&lt;br /&gt;
: This gives just the cladograms without further descriptions.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Save the trees to a file:&#039;&#039;&#039;&lt;br /&gt;
 savetrees file=mhcalltrees.nexus brlens=yes&lt;br /&gt;
: This saves the trees to a file named &amp;quot;mhcalltrees.nexus&amp;quot; with indication of the location of the root, and with information about branch lengths.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;View trees in FigTree viewer:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Now, open a second Terminal window, change to the working directory and open the tree file with the figtree viewer as follows:&lt;br /&gt;
 cd /path/to/molevol/parsimony&lt;br /&gt;
 figtree mhcalltrees.nexus&lt;br /&gt;
: With this graphical viewer you can investigate the tree more closely and export it as a figure in various formats. The program has several options for altering the display of the tree, including viewing the tree as unrooted, and altering the rooting interactively. After you&#039;ve played around with the possibilities for a while you should close the window again.&lt;br /&gt;
&lt;br /&gt;
: If, later in this course, you want to construct a tree figure that is prettier than the ASCII rendition that PAUP gives you, then you need to repeat the steps performed above (save tree using the savetrees command, and subsequently open in FigTree. Remember the option &amp;quot;&amp;lt;code&amp;gt;brlens=yes&amp;lt;/code&amp;gt;&amp;quot; in order to get the phylogram). The FigTree viewer is available for several platforms.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find the most parsimonious trees using branch and bound:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Now return to the shell window where you have PAUP running and perform a branch and bound search for the best trees:&lt;br /&gt;
 bandb&lt;br /&gt;
: As explained in the lecture branch and bound is guaranteed to find the best trees without necessarily searching through all of tree-space. There is therefore actually no reason to use alltrees unless you are explicitly interested in examining suboptimal trees or the distribution of all tree lengths.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What is the length of the best tree found using branch and bound?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 10&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Is that the same value found using exhaustive search?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 11&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Examine the branch and bound trees:&#039;&#039;&#039;&lt;br /&gt;
 showtrees all&lt;br /&gt;
: (Remember: you also have the possibility of using FigTree as explained above).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Load the previously found trees into memory:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
We want to convince ourselves that the branch and bound trees really are identical to the trees found by the alltrees command. In order to do that we must now load the previously found trees back into memory while still retaining the branch and bound tree:&lt;br /&gt;
 gettrees file=mhcalltrees.nexus mode=7&lt;br /&gt;
Answer &amp;quot;yes&amp;quot; when asked whether you want to proceed). The mode=7 command means that PAUP* should keep all trees that are currently in memory and append all trees from the file. We now have four trees in memory (the two found by alltrees and the two found by bandb). Now compare the four trees and convince yourself that the same two trees were found by bandb:&lt;br /&gt;
 showtrees all&lt;br /&gt;
You can also compare the scores:&lt;br /&gt;
 pscores all&lt;br /&gt;
As one final way of establishing that the trees are identical you can compute the &amp;quot;distance&amp;quot; between the four trees in memory:&lt;br /&gt;
 treedist&lt;br /&gt;
: This indicates how similar the trees in memory are by computing the so-called [https://en.wikipedia.org/wiki/Robinson–Foulds_metric Robinson–Foulds or symmetric difference metric]. The output includes both a table listing all pairwise differences and a bar-chart giving the distribution of differences. Two trees with identical topology will have a distance of zero.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Are the two new trees the same as two previously found trees?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 12&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Define a constraint for tree-searching:&#039;&#039;&#039;&lt;br /&gt;
 constraints homooran (monophyly)=((human,orangutan));&lt;br /&gt;
: This defines a constraint named &amp;quot;homooran&amp;quot; which requires the taxons &amp;quot;human&amp;quot; and &amp;quot;orangutan&amp;quot; to form a monophyletic group. The constraint tree was here given as simply: ((human,orangutan)); The tree is here shown [https://en.wikipedia.org/wiki/Newick_format using a notation] where a pair of parentheses corresponds to an internal node, while a comma-separated list enclosed by a pair of parentheses indicates the subtrees that branch out from this internal node. The constraint tree shown above is a brief way of defining the full unresolved constraint tree: &amp;lt;code&amp;gt;(gorilla,gibbon,bonobo,chimpanzee,yellow_baboon,olive_baboon,macaque,(human,orangutan));&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find the best tree that satisfies the constraint:&#039;&#039;&#039;&lt;br /&gt;
 bandb /constraints=homooran enforce=yes&lt;br /&gt;
: This performs a branch and bound search for the best tree that has human and orangutan together as a monophyletic group. Note the option enforce=yes which ensures that the named constraint is applied.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Compare the length of this tree with the best tree found above. How many extra steps (mutations) are required in this tree? (The difference tells you something about how much better one hypothesis is than the other. There are tests that will tell you whether the difference is significant, but we will not get into that at this point in the course).&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Phylogenetic Analysis of Mitochondrial DNA Sequences ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 13&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Delete previously found trees from memory:&#039;&#039;&#039;&lt;br /&gt;
 cleartrees&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Prepare data for analysis:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
We will now analyze the primate mitochondrial data set. Using what you learned above, perform the following steps:&lt;br /&gt;
* Align the sequences present in the file primate_mtDNA_interleaved.fasta&lt;br /&gt;
* Convert alignment to NEXUS format, and save to file &amp;lt;code&amp;gt;primate_mtDNA_interleaved.nexus&amp;lt;/code&amp;gt;&lt;br /&gt;
* Load data into PAUP&lt;br /&gt;
* Set outgroup to consist of all the non-hominoid species: Macaca_fuscata M_mulatta M_fascicularis M_sylvanus Saimiri_sciureus Tarsius_syrichta Lemur_catta&lt;br /&gt;
* Activate outgroup rooting and select output of outgroup as monophyletic sister group to the ingroup.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Estimate time needed for exhaustive search:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
We now have 12 taxons in our data set, corresponding to 654,729,075 possible trees. We first want to estimate how long it would take to search through every single one of them, but we do not want to actually wait for the required amount of time. First, start the search as indicated below:&lt;br /&gt;
 alltrees&lt;br /&gt;
At the bottom of the screen you will see a progress bar indicating the percentage of tree-space that has been explored. Wait until the program reaches about 10% and interrupt the run by pressing CTRL-C (and replying Y to the question of whether you want to stop).&lt;br /&gt;
 After about 10% finished: press CTRL + C&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Use the numbers to estimate how many minutes it would take to work through all 654,729,075 possible trees.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; This will obviously depend on your specific machine&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 14&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find the best trees using branch and bound:&#039;&#039;&#039;&lt;br /&gt;
 bandb&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;How long did it take to find the best trees by branch and bound (sec)? Compare this to the estimated time you would have had to wait for the alltrees command to finish. The time saved by ignoring suboptimal parts of search space can be quite impressive, but it depends on the structure of your data set.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 15&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect most parsimonious trees:&#039;&#039;&#039;&lt;br /&gt;
 describetrees all/plot=phylogram labelnode=no&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Which species is closer to the root: Pongo (Orangutan) or Hylobates (Gibbon)?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Phylogenetic Analysis of viral DNA Sequences ==&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 16&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Delete previously found trees from memory:&#039;&#039;&#039;&lt;br /&gt;
 cleartrees&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Prepare and load the data:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
We will now investigate the data in the file hcv.fasta. Using what you have learned above:&lt;br /&gt;
* Align sequences that are in the file hcv.fasta&lt;br /&gt;
* Convert to Nexus format and save to file hcv.nexus&lt;br /&gt;
* Load Nexus format alignment into PAUP&lt;br /&gt;
&lt;br /&gt;
This dataset consists of 41 hepatitis C virus sequences obtained from four different patients. The alignment should not contain any gaps (there have been no insertions or deletions. So strictly speaking we did not need to align, however you can&#039;t know that without aligning the sequences).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Enable PAUP* to store an unlimited number of trees:&#039;&#039;&#039;&lt;br /&gt;
 set increase=auto&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Perform a heuristic search using NNI:&#039;&#039;&#039;&lt;br /&gt;
 hsearch start=stepwise addseq=random nreps=100 rseed=117 swap=NNI&lt;br /&gt;
: We now have so many sequences that exhaustive searching using alltrees or even bandb is impossible (there are approximately 10^60 possible trees). We therefore have to employ heuristic searching. The above command starts a heuristic search using sequential addition with random addition to construct the initial tree. (In all 100 different random addition sequences, and therefore starting trees, are tried). The option rseed=117 controls the random number generator and ensures that student results will be comparable (you can experiment with other seeds, or with leaving out the option alltogether if you are interested; ordinarily you would not set this - it is only for the purpose of the exercise). Once an initial tree has been constructed, the heuristic search proceeds by rearrangements of the &amp;quot;nearest neighbor interchange&amp;quot; type (NNI).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Inspect search result:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
This time the search results in a large number of equally parsimonious trees. The result is summarized in the form of a &amp;quot;tree-island profile&amp;quot;. Trees from the same island are much more similar to each other than to trees from other islands. Specifically, a tree-island is defined as a set of trees where you can go from any tree to any other tree using one or more re-arrangements of the type currently used for searching tree-space (e.g., NNI). &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;What is the best score found?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 17&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;How many &#039;&#039;tree islands&#039;&#039; are there with the best score?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 18&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;How many &#039;&#039;trees&#039;&#039; were found with the best score?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 19&#039;&#039;&#039;&lt;br /&gt;
&#039;&#039;&#039;Perform a heuristic search using SPR:&#039;&#039;&#039;&lt;br /&gt;
 hsearch start=stepwise addseq=random nreps=100 rseed=117 swap=SPR&lt;br /&gt;
: This starts a heuristic search using rearrangements of the &amp;quot;subtree pruning and re-grafting&amp;quot; type (SPR). SPR is more elaborate than NNI and examines many more neighbors for each tree.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Did this search find the same best score as NNI?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 20&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Perform a heuristic search using TBR:&#039;&#039;&#039;&lt;br /&gt;
 hsearch start=stepwise addseq=random nreps=100 rseed=117 swap=TBR&lt;br /&gt;
: This starts a heuristic search using rearrangements of the &amp;quot;tree bisection and reconnection&amp;quot; type (TBR). TBR is more elaborate than both NNI and SPR. TBR is the default swap mode for heuristic searching in PAUP*, and NNI or SPR should mostly be used if you are interested in reducing search time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;How many tree islands are there with the best score when using TBR?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 21&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039;Why do you think there are fewer islands with TBR and SPR compared to NNI?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Quitting PAUP program&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
To stop the PAUP program and return to the shell command prompt, you simply type:&lt;br /&gt;
 q&lt;br /&gt;
: for &amp;quot;quit&amp;quot;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Population_Growth,_Fitness,_and_Selection&amp;diff=23</id>
		<title>Population Growth, Fitness, and Selection</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Population_Growth,_Fitness,_and_Selection&amp;diff=23"/>
		<updated>2024-03-19T13:23:03Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;This exercise is part of the course  Computational Molecular Evolution (22115).  === Overview ===  In this exercise you are going to use a simple computer program for simulating the growth of a population. The simulation will be based on an exponential growth model and will involve organisms having one of two different genotypes (&amp;#039;A&amp;#039; and &amp;#039;a&amp;#039;). By performing the simulation with a range of different model parameter values you w...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This exercise is part of the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]].&lt;br /&gt;
&lt;br /&gt;
=== Overview ===&lt;br /&gt;
&lt;br /&gt;
In this exercise you are going to use a simple computer program for simulating the growth of a population. The simulation will be based on an exponential growth model and will involve organisms having one of two different genotypes (&#039;A&#039; and &#039;a&#039;). By performing the simulation with a range of different model parameter values you will sharpen your intuitive understanding of the dynamics of the model.&lt;br /&gt;
&lt;br /&gt;
You will also get a brief introduction to assessing how well a model describes some observed data (the &amp;quot;fit&amp;quot; of a model), and learn how model parameters can be estimated from such data (the process known as &amp;quot;model fitting&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
Finally, you will start learning how to work in the UNIX environment that we will use in the rest of the course (and that is alo used by bioinformaticians in the real world).&lt;br /&gt;
&lt;br /&gt;
If you are interested in learning more about UNIX you can browse [http://www.ee.surrey.ac.uk/Teaching/Unix/ this set of tutorials].&lt;br /&gt;
&lt;br /&gt;
=== Getting started ===&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;1. Construct today&#039;s working directory:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
In the command below: Instead of &amp;lt;code&amp;gt;/path/to/molevol&amp;lt;/code&amp;gt; enter the path to the directory where you have placed your course files (for instance &amp;lt;code&amp;gt;cd /Users/bob/Documents/molevol&amp;lt;/code&amp;gt;, or &amp;lt;code&amp;gt;cd /home/student/molevol&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
 cd /path/to/molevol&lt;br /&gt;
 mkdir simulation&lt;br /&gt;
 cd simulation&lt;br /&gt;
&lt;br /&gt;
The mkdir (make directory) command constructs a new directory named &amp;quot;simulation&amp;quot;. Subsequently the cd (change directory) command is used to select the newly constructed directory as &amp;quot;current working directory&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2. Copy required files:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
 cp ../data/growth.py ./growth.py&lt;br /&gt;
 cp ../data/modelfit.data ./modelfit.data&lt;br /&gt;
 ls -l&lt;br /&gt;
&lt;br /&gt;
The cp command copies the simulation program (growth.py) and a file containing experimental data (modelfit.data), from the data directory to your current working directory. Two dots in a filepath (&amp;lt;code&amp;gt;../&amp;lt;/code&amp;gt;) means &amp;quot;parent directory&amp;quot; (so two levels up would be: &amp;lt;code&amp;gt;../../&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
The ls (list) command with option -l (long) shows information about the files present in the current directory.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;3. Start extra Terminal window and cd to working directory:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Start a second Terminal app via the start menu. Also cd to today&#039;s working directory in that terminal window:&lt;br /&gt;
&lt;br /&gt;
 cd /path/to/molevol/simulation&lt;br /&gt;
&lt;br /&gt;
In this part of the exercise, you will need two terminal windows open at the same time.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
=== Exploration of models of Population Growth, Fitness, and Selection ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 1&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Have a look at the simulation program:&lt;br /&gt;
 nedit growth.py &amp;amp;&lt;br /&gt;
: This command opens the file growth.py (which is a simple text file) in the nedit editor. Having an ampersand (&amp;quot;&amp;amp;&amp;quot;) at the end of the command makes nedit run in the background.&lt;br /&gt;
&lt;br /&gt;
: &#039;&#039;&#039;Note:&#039;&#039;&#039; you can use any text editor to open this python file. If you are on a mac you will probably use `mate` instead of `nedit` for instance.&lt;br /&gt;
&lt;br /&gt;
: The growth.py program is written in the programming language Python. You will probably be able to understand most of what is going on even if you have no programming experience (Note that hash signs - &amp;quot;#&amp;quot; - indicate the beginning of a comment. Text after # on a line will not be executed.)&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question&#039;&#039;&#039;: what are the initial values of the parameters N0 (initial population size), fA (initial frequency of &#039;A&#039;), rate_A (growth rate, or fitness, of &#039;A&#039;), and rate_a (growth rate of &#039;a&#039;)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 2&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Run the simulation program:&lt;br /&gt;
&lt;br /&gt;
In one of your terminal windows, type the following:&lt;br /&gt;
 ./growth.py&lt;br /&gt;
: This command executes the program growth.py, which is located in your current working directory (the dot in &amp;quot;./growth.py&amp;quot; means &amp;quot;current directory&amp;quot;). For each generation the state of the population and the frequencies of the &#039;A&#039; and &#039;a&#039; alleles will be printed to the screen. The simulation runs for 20 generations, but you could change this by altering the max_gen parameter.&lt;br /&gt;
&lt;br /&gt;
Run the simulation again, saving output to file:&lt;br /&gt;
 ./growth.py &amp;gt; poptest&lt;br /&gt;
: The symbol &amp;quot;&amp;gt;&amp;quot; causes the output to be &amp;quot;redirected&amp;quot; to a file named &amp;quot;poptest&amp;quot; (you could have named it anything). Open the result file in nedit to verify that all output has been saved:&lt;br /&gt;
 nedit poptest &amp;amp;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question&#039;&#039;&#039;: what is the total population size (N) at generation 0 and generation 20? (Close the nedit window when you&#039;re done.)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 3&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Start rstudio: &lt;br /&gt;
&lt;br /&gt;
(Note: depending on you operating system you may have to start Rstudio using the command-line as below, or by double clicking or using the start-menu. If you do not use the command-line, then you need to set the current working directory to be your simulation directory).&lt;br /&gt;
&lt;br /&gt;
In the second terminal window type the following:&lt;br /&gt;
 rstudio &amp;amp;&lt;br /&gt;
&lt;br /&gt;
: RStudio is an &amp;quot;integrated development environment for R, a programming language for statistical computing and graphics.&amp;quot; It is among the tools you are required to have some familiarity with when you do bioinformatics.&lt;br /&gt;
&lt;br /&gt;
In the console window in RStudio, type the following to load the &amp;quot;tidyverse&amp;quot; package:&lt;br /&gt;
 library(tidyverse)&lt;br /&gt;
&lt;br /&gt;
: The &#039;tidyverse&#039; is &amp;quot;a set of packages that work in harmony because they share common data representations and user interface. This package is designed to make it easy to install and load multiple &#039;tidyverse&#039; packages in a single step. Learn more about the &#039;tidyverse&#039; at &amp;lt;https://tidyverse.org&amp;gt;.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
Now, read data from the file you just created, select variables, and reshape it for plotting: In RStudio type the following:&lt;br /&gt;
 df = read_table(&amp;quot;poptest&amp;quot;)&lt;br /&gt;
 df2 = df %&amp;gt;% select(t, N, N_A, N_a) &lt;br /&gt;
 df3 = df2 %&amp;gt;% pivot_longer(cols = -c(&amp;quot;t&amp;quot;))&lt;br /&gt;
&lt;br /&gt;
: The [https://thatdatatho.com/2019/03/13/tutorial-about-magrittrs-pipe-operator-and-placeholders/ &amp;quot;%&amp;gt;%&amp;quot; is the so-called &amp;quot;pipe&amp;quot; symbol], and works like pipe in UNIX, i.e, by sending the output from one command into a second command. This is extremely useful for making long, combined manipulations in R, while keeping them readable. The select command selects the columns we want to use for plotting (t, and all columns with population info: N, N_A, N_a). The pivot_longer command converts data from [https://seananderson.ca/2013/10/19/reshape/ wide format to long format], which is useful when you want to automatically create legends for multiple variables (More generally, you want to understand what &amp;quot;tidy&amp;quot; data is, and why that will help with your data analyses: [https://r4ds.had.co.nz/tidy-data.html R For Data Science: Tidy Data]). You may want to inspect the intermediate data frames by simply typing their names in RStudio:&lt;br /&gt;
 df&lt;br /&gt;
&lt;br /&gt;
 df2&lt;br /&gt;
&lt;br /&gt;
 df3&lt;br /&gt;
&lt;br /&gt;
(Note: Normally you would here have used the pipe operator to chain these commands together, thus avoiding constructing the intermediate data frames. Here I only included these steps to make it clear what is going on in the different steps of the command).&lt;br /&gt;
&lt;br /&gt;
Now, plot the population sizes:&lt;br /&gt;
  ggplot(df3, aes(x=t, y=value, color=name)) + geom_line()&lt;br /&gt;
&lt;br /&gt;
: The ggplot command plots the population data as a function of time, with automatic legend and coloration based on variable (N, N_A, and N_a).  Specifically, we here plot total population size (&amp;quot;N&amp;quot;), the number of organisms with allele &#039;A&#039; (&amp;quot;NA&amp;quot;), and the number of organisms with allele &#039;a&#039; (&amp;quot;Na&amp;quot;) for each generation. Recall that in this run the two alleles had the same fitness (rate_A = rate_a = 1.2) but different initial population sizes (fA = 0.3, fa = 0.7).&lt;br /&gt;
&lt;br /&gt;
Now plot the allele frequencies: In RStudio type the following:&lt;br /&gt;
 df4 =  df %&amp;gt;% select(t, fA, fa) %&amp;gt;% pivot_longer(cols = -c(&amp;quot;t&amp;quot;))&lt;br /&gt;
 ggplot(df4, aes(x=t, y=value, color=name)) + geom_line()&lt;br /&gt;
&lt;br /&gt;
: Here we have plotted the &#039;&#039;frequencies&#039;&#039; of the two alleles for each simulated generation. Note how we here combined the select and pivot_longer commands into one by using the pipe.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039; How do the frequencies of the two alleles (fA and fa) behave for this case where the two alleles have the same fitness?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 4:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Simulation number 1:&#039;&#039;&#039; In the terminal window where you are not running RStudio: Using what you learned above, run the simulation program for the parameter values listed below, and save the output to a file named &amp;quot;res.1&amp;quot; (without the quotes). &lt;br /&gt;
&lt;br /&gt;
You should use nedit to alter the relevant parameter values in the growth.py file. Remember to save growth.py after making the alterations (File -&amp;gt; Save).&lt;br /&gt;
&lt;br /&gt;
 N0 = 50     rate_A = 1.2  rate_a = 1.2  fA = 0.3&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039; What is the relative fitness of allele &#039;a&#039; (using &#039;A&#039; as a reference):&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 5:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Based on the file &amp;quot;res.1&amp;quot; you just created above, plot the population sizes: In RStudio, type the following:&lt;br /&gt;
 df = read_table(&amp;quot;res.1&amp;quot;)&lt;br /&gt;
 df2 =  df %&amp;gt;% select(t, N, N_A, N_a) %&amp;gt;% pivot_longer(cols = -c(&amp;quot;t&amp;quot;))&lt;br /&gt;
 ggplot(df2, aes(x=t, y=value, color=name)) + geom_line()&lt;br /&gt;
&lt;br /&gt;
Now plot allele frequencies:&lt;br /&gt;
 df3 =  df %&amp;gt;% select(t, fA, fa) %&amp;gt;% pivot_longer(cols = -c(&amp;quot;t&amp;quot;))&lt;br /&gt;
 ggplot(df3, aes(x=t, y=value, color=name)) + geom_line()&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; What is the behavior of population sizes (N_A and N_a) and allele frequencies (fA and fa) for this set of parameter values?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 6:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Simulation number 2:&#039;&#039;&#039; Again, using what you have learned, Run the simulation program for the parameter values listed below, and save the output to a file named &amp;quot;res.2&amp;quot; (without the quotes).&lt;br /&gt;
 N0 = 1000   rate_A = 0.7  rate_a = 0.7  fA = 0.3&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; What is the relative fitness of allele &#039;a&#039; (using &#039;A&#039; as a reference):&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 7:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Plot population sizes: In RStudio, type the following:&lt;br /&gt;
 df = read_table(&amp;quot;res.2&amp;quot;)&lt;br /&gt;
 df2 =  df %&amp;gt;% select(t, N, N_A, N_a) %&amp;gt;% pivot_longer(cols = -c(&amp;quot;t&amp;quot;))&lt;br /&gt;
 ggplot(df2, aes(x=t, y=value, color=name)) + geom_line()&lt;br /&gt;
&lt;br /&gt;
Plot allele frequencies:&lt;br /&gt;
 df3 =  df %&amp;gt;% select(t, fA, fa) %&amp;gt;% pivot_longer(cols = -c(&amp;quot;t&amp;quot;))&lt;br /&gt;
 ggplot(df3, aes(x=t, y=value, color=name)) + geom_line()&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; What is the behavior of population sizes (N_A and N_a) and allele frequencies (fA and fa) for this set of parameter values?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 8:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Simulation number 3:&#039;&#039;&#039; Run the simulation program for the parameter values listed below, and save the output to a file named &amp;quot;res.3&amp;quot;.&lt;br /&gt;
 N0 = 1000   rate_A = 2.0  rate_a = 1.5  fA = 0.02&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039; What is the relative fitness of allele &#039;a&#039; (using &#039;A&#039; as a reference)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 9:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Plot population sizes: &lt;br /&gt;
 df = read_table(&amp;quot;res.3&amp;quot;)&lt;br /&gt;
 df2 =  df %&amp;gt;% select(t, N, N_A, N_a) %&amp;gt;% pivot_longer(cols = -c(&amp;quot;t&amp;quot;))&lt;br /&gt;
 ggplot(df2, aes(x=t, y=value, color=name)) + geom_line()&lt;br /&gt;
&lt;br /&gt;
Plot allele frequencies:&lt;br /&gt;
 df3 =  df %&amp;gt;% select(t, fA, fa) %&amp;gt;% pivot_longer(cols = -c(&amp;quot;t&amp;quot;))&lt;br /&gt;
 ggplot(df3, aes(x=t, y=value, color=name)) + geom_line()&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; What is the behavior of population sizes (N_A and N_a) and allele frequencies (fA and fa) for this set of parameter values?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 10:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Simulation number 4:&#039;&#039;&#039; Run the simulation program for the parameter values listed below, and save the output to a file named &amp;quot;res.4&amp;quot;.&lt;br /&gt;
 N0 = 1000   rate_A = 1.2  rate_a = 0.9  fA = 0.02&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; What is the relative fitness of allele &#039;a&#039; (using &#039;A&#039; as a reference)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 11:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Plot population sizes: &lt;br /&gt;
 df = read_table(&amp;quot;res.4&amp;quot;)&lt;br /&gt;
 df2 =  df %&amp;gt;% select(t, N, N_A, N_a) %&amp;gt;% pivot_longer(cols = -c(&amp;quot;t&amp;quot;))&lt;br /&gt;
 ggplot(df2, aes(x=t, y=value, color=name)) + geom_line()&lt;br /&gt;
&lt;br /&gt;
Plot allele frequencies:&lt;br /&gt;
 df3 =  df %&amp;gt;% select(t, fA, fa) %&amp;gt;% pivot_longer(cols = -c(&amp;quot;t&amp;quot;))&lt;br /&gt;
 ggplot(df3, aes(x=t, y=value, color=name)) + geom_line()&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; What is the behavior of population sizes (N_A and N_a) and allele frequencies (fA and fa) for this set of parameter values?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 12:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Simulation number 5:&#039;&#039;&#039; Run the simulation program for the parameter values listed below, and save the output to a file named &amp;quot;res.5&amp;quot;.&lt;br /&gt;
 N0 = 10_000  rate_A = 0.8  rate_a = 0.6  fA = 0.02&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039; In python (for versions &amp;gt;= 3.6) you can [https://peps.python.org/pep-0515/ add underscores to numbers to aid readability]. The underscores will be ignored by python but are helpful to get the correct number of digits in a number.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; What is the relative fitness of allele &#039;a&#039; (using &#039;A&#039; as a reference)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 13:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Plot population sizes: &lt;br /&gt;
 df = read_table(&amp;quot;res.5&amp;quot;)&lt;br /&gt;
 df2 =  df %&amp;gt;% select(t, N, N_A, N_a) %&amp;gt;% pivot_longer(cols = -c(&amp;quot;t&amp;quot;))&lt;br /&gt;
 ggplot(df2, aes(x=t, y=value, color=name)) + geom_line()&lt;br /&gt;
&lt;br /&gt;
Plot allele frequencies:&lt;br /&gt;
 df3 =  df %&amp;gt;% select(t, fA, fa) %&amp;gt;% pivot_longer(cols = -c(&amp;quot;t&amp;quot;))&lt;br /&gt;
 ggplot(df3, aes(x=t, y=value, color=name)) + geom_line()&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; What is the behavior of population sizes (N_A and N_a) and allele frequencies (fA and fa) for this set of parameter values?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 14:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Compare simulation 3 and 5:&#039;&#039;&#039; Simulation 3 had rate_A = 2.0 and rate_a = 1.5, while simulation 5 had rate_A = 0.8 and rate_a = 0.6. In the first simulation there was therefore exponential growth, while in the second there was exponential decline. Now, look at the plots of allele frequencies from these two simulations and compare their behaviour. (If you know some R you may even try to plot both sets of allele frequencies in the same plot). &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; How does the behavior of fA and fa compare in the two simulations?&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
== Model Fit, Parameter Estimation ==&lt;br /&gt;
&lt;br /&gt;
In this part of the exercise, we will briefly consider some aspects of the fit between models and reality.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 15:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Have a look at the data file:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
In RStudio type this:&lt;br /&gt;
 dat = read_table(&amp;quot;modelfit.data&amp;quot;)&lt;br /&gt;
 print(dat)&lt;br /&gt;
: This file contains a set of (pseudo) empirical data: the size of a population (column labeled &amp;quot;N&amp;quot;) for a number of generations (column labeled t).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Plot data points:&#039;&#039;&#039;&lt;br /&gt;
 ggplot(dat, aes(x=t, y=N)) + geom_point()&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Select first 7 data points for model fitting:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
We will use the first 7 data points for fitting models, while leaving out the 8th data point to test how well our fitted models generalise. (This is a very minimal version of a technique called &amp;quot;out of sample testing&amp;quot;). &lt;br /&gt;
 train = dat %&amp;gt;% filter(t&amp;lt;14)&lt;br /&gt;
 print(train)&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Estimating parameters of exponential growth model from data:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
We will now assume that the population is growing exponentially according to this model:&lt;br /&gt;
&lt;br /&gt;
: &amp;lt;math&amp;gt;&lt;br /&gt;
N_t = N_0 \exp(r \times t)&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;N_0&amp;lt;/math&amp;gt; is the initial population size, &amp;lt;math&amp;gt;r&amp;lt;/math&amp;gt; is the instantaneous rate of increase, and &amp;lt;math&amp;gt;t&amp;lt;/math&amp;gt; is the generation number. Our model thus has two &amp;quot;free parameters: &amp;lt;math&amp;gt;N_0&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;r&amp;lt;/math&amp;gt;. You will now attempt to find a &amp;quot;good&amp;quot; set of values for these parameters. We will take &amp;quot;good&amp;quot; values to be those that cause the theoretical curve to lie as close as possible to the observed data. This process is called model-fitting.&lt;br /&gt;
&lt;br /&gt;
There are actually several different ways of defining &amp;quot;as close as possible&amp;quot;. One measure that turns out to be convenient is the &amp;quot;sum of squared residuals&amp;quot; (SSR; the word &amp;quot;error&amp;quot; is sometimes used instead of &amp;quot;residual&amp;quot;). The approach is the following: for each x,y point in the data set, the difference between the observed y and the y-value predicted by the model is computed. This difference (the &amp;quot;error&amp;quot; or &amp;quot;residual&amp;quot;) is then squared, and the sum of all the squared residual terms is then taken to be an indication of how well the model fits the data. The best fitting model thus has the smallest possible SSR, and this approach is therefore referred to as &amp;quot;least squares model fitting&amp;quot;. Another measure of model fit that we will return to later in the course is the model likelihood.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Find a good set of initial parameter values:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
We will use the [https://stat.ethz.ch/R-manual/R-devel/library/stats/html/nls.html &#039;&#039;nls&#039;&#039; function] (&amp;quot;Non-linear Least Squares&amp;quot;) in R to fit our models. This function takes as input a formula (the model) and some data, and returns estimates of the free model parameters, along with summaries of model fit. The function also requires a set of &amp;quot;start values&amp;quot; - guesses at approximate values for the free parameters in the model - to run. These do not need to be very close to the final result, they should just help by placing the numerical model fitting algorithm in a reasonable neighbourhood in parameter space.&lt;br /&gt;
&lt;br /&gt;
In the R commands below, replace &amp;quot;???&amp;quot; by your guess about reasonable parameter values, and then check how well they fit by plotting the data points along with the line corresponding to your initial values. You will probably need to repeat this step a few times until you are in the vicinity of the correct values (but dont overdo it - the nls function should do most of the work). &#039;&#039;&#039;Note:&#039;&#039;&#039; It will make it simpler to repeat these commands if you place them in a small R script (click the plus icon in the upper left corner of RStudio and select &amp;quot;R Script&amp;quot;):&lt;br /&gt;
 N0_init = ???&lt;br /&gt;
 r_init = ???&lt;br /&gt;
 xpred = seq(0, 15, 0.1)&lt;br /&gt;
 ypred = N0_init * exp(r_init * xpred)&lt;br /&gt;
 dfpred = tibble(x=xpred, y=ypred)&lt;br /&gt;
 ggplot(train, aes(x=t, y=N)) + &lt;br /&gt;
     geom_point() + &lt;br /&gt;
     geom_line(dfpred, mapping=aes(x=x, y=y), color=&amp;quot;blue&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Fit the exponential model:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Once you are satisfied with the initial parameter values, use the following commands to fit the exponential model using least squares:&lt;br /&gt;
 fit1 = nls(&lt;br /&gt;
     formula = N ~ N0 * exp(r * t), &lt;br /&gt;
     data = train,&lt;br /&gt;
     start=list(N0=N0_init, r=r_init),&lt;br /&gt;
     trace=TRUE&lt;br /&gt;
 )&lt;br /&gt;
&lt;br /&gt;
The option trace=TRUE causes nls to print the sum-of-squared residuals and the parameter values are printed at the conclusion of each iteration. Have a look at how well the fitted values match the training data:&lt;br /&gt;
 xpred = seq(0, 15, 0.1)&lt;br /&gt;
 ypred = predict(fit1, list(t = xpred))&lt;br /&gt;
 dfpred = tibble(x=xpred, y=ypred)&lt;br /&gt;
 ggplot(train, aes(x=t, y=N)) + &lt;br /&gt;
     geom_point() + &lt;br /&gt;
     geom_line(dfpred, mapping=aes(x=x, y=y), color=&amp;quot;blue&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
The [https://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.lm.html function &amp;quot;predict()&amp;quot;] takes as input the results from fitting a model (i.e., estimates of the free parameters) along with some predictor values (here the t-values). It produces as output the predicted outcome values (here population sizes) corresponding to the input predictor values.&lt;br /&gt;
&lt;br /&gt;
Now, print a summary of the fitted model&lt;br /&gt;
 print(fit1)&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Find the estimated parameters (N0 and r) and the residual sum of squares of the fitted model in this output.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 16:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Fit polynomial model:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
So far we have assumed that the exponential model was the best choice for describing the growth data. This model has two free parameters: &amp;quot;N0&amp;quot; and &amp;quot;r&amp;quot;. Let us now consider an alternative model with 7 free parameters, namely the 6&#039;th order polynomial&lt;br /&gt;
&lt;br /&gt;
: &amp;lt;math&amp;gt;&lt;br /&gt;
N_t = b_0+b_1 t+b_2 t^2+b_3 t^3+b_4 t^4+ b_5 t^5 + b_6 t^6&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The free parameters of this model are: &amp;lt;math&amp;gt;b_0, b_1, b_2, b_3, b_4, b_5, b_6&amp;lt;/math&amp;gt;. Note that we now have as many free parameters as data points in our training set&lt;br /&gt;
&lt;br /&gt;
Using the same approach as we did above for an exponential model, we now fit this polynomial model to the data. In this case it turns out we do not need to include guesses at starting values:&lt;br /&gt;
 fit2 = nls(&lt;br /&gt;
     formula = N ~ b0 + I(b1*t) + I(b2*t^2) + I(b3*t^3) + I(b4*t^4) + I(b5*t^5) + I(b6*t^6), &lt;br /&gt;
     data = train,&lt;br /&gt;
     trace=TRUE,&lt;br /&gt;
     control=list(warnOnly = TRUE)&lt;br /&gt;
 )&lt;br /&gt;
Ignore the warning message. The I() function is here used to force nls to interpret the multiplication and exponentiation operators (* and ^) literally. [https://stackoverflow.com/questions/24192428/what-does-the-capital-letter-i-in-r-linear-regression-formula-mean This is necessary because they have special meanings in R formulas]. The option warnOnly = TRUE was needed to force nls to finish fitting the model (in this case, with as many parameters as data points, nls would usually stop with an error message).&lt;br /&gt;
&lt;br /&gt;
Plot the fitted model compared to the training data:&lt;br /&gt;
 xpred = seq(0, 13, 0.1)&lt;br /&gt;
 ypred = predict(fit2, list(t = xpred))&lt;br /&gt;
 dfpred = tibble(x=xpred, y=ypred)&lt;br /&gt;
 ggplot(train, aes(x=t, y=N)) + &lt;br /&gt;
     geom_point() + &lt;br /&gt;
     geom_line(dfpred, mapping=aes(x=x, y=y), color=&amp;quot;blue&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
Also, print a summary of the fitted model:&lt;br /&gt;
 print(fit2)&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Report the estimated parameters (b0, b1, b2, b3, b4, b5, b6) and the sum of squares of residuals for the fitted model&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 17:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Does the polynomial or the exponential model have the best fit, measured using the sum of squared residuals?&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&#039;&#039;&#039;Question 18:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Assess predictive performance of polynomial and exponential models:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
You will recall that we used 7 of our 8 data points for fitting the models. The 8th data point had these values:&lt;br /&gt;
    t       N&lt;br /&gt;
   14    11825&lt;br /&gt;
&lt;br /&gt;
Now, use the models trained on the first 7 data points to predict the population value for t=14:&lt;br /&gt;
 predict(fit1, list(t=14))&lt;br /&gt;
 predict(fit2, list(t=14))&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Report the predicted values for t=14 for the exponential and polynomial models&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question 19:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Also, plot the two models and compare to all 8 data points:&lt;br /&gt;
 xpred = seq(0, 14.5, 0.1)&lt;br /&gt;
 yexp = predict(fit1, list(t = xpred))&lt;br /&gt;
 ypoly = predict(fit2, list(t = xpred))&lt;br /&gt;
 dfpred = tibble(x=xpred, exponential=yexp, polynomium=ypoly) %&amp;gt;%&lt;br /&gt;
     pivot_longer(cols=c(exponential, polynomium), names_to=&amp;quot;model&amp;quot;) %&amp;gt;%&lt;br /&gt;
     filter(value &amp;lt; 14000 &amp;amp; value &amp;gt; -5000)&lt;br /&gt;
 ggplot() + &lt;br /&gt;
     geom_point(dat, mapping=aes(x=t, y=N)) + &lt;br /&gt;
     geom_line(dfpred, mapping=aes(x=x, y=value, color=model))&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question: &#039;&#039;&#039; Based on the data point for t=14 which model do you now think has captured the biological reality best?&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=22115_-_Computational_Molecular_Evolution&amp;diff=22</id>
		<title>22115 - Computational Molecular Evolution</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=22115_-_Computational_Molecular_Evolution&amp;diff=22"/>
		<updated>2024-03-19T13:22:15Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: /* Lecture Schedule */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;; Overview  [[File:Darwin logo2 medium.png |right|border|550px]]&lt;br /&gt;
: This page contains links to video lectures, computer exercises, and other material for the course [https://kurser.dtu.dk/course/22115 22115 - Computational Molecular Evolution], which is part of the [https://www.dtu.dk/english/education/msc/programmes/systems_biology MSc in Bioinformatics and Systems Biology] at the [https://www.dtu.dk/english Technical University of Denmark]. The course is taught by Professor Anders Gorm Pedersen, [https://www.healthtech.dtu.dk/english/Research/Research-Sections/Section-Bioinformatics Section for Bioinformatics], [https://www.healthtech.dtu.dk/english Department of Health Technology].&lt;br /&gt;
&lt;br /&gt;
: The main goal of this course is to give an introduction to theory and algorithms in the field of computational molecular evolution. We will cover basic evolutionary theory (common descent, natural selection, genetic drift, models of growth and selection), and the main types of algorithms used for constructing and analyzing phylogenetic trees (parsimony, distance based methods, maximum likelihood methods, and Bayesian inference). We will also discuss the role of statistical modeling in science more generally&lt;br /&gt;
&lt;br /&gt;
:The course will consist of lectures, computer exercises, and mini-projects. The student will acquire practical experience in the use of a range of computational methods by analyzing sequences from the scientific literature.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
==&#039;&#039;&#039;Computer setup&#039;&#039;&#039;==&lt;br /&gt;
&lt;br /&gt;
===Linux===&lt;br /&gt;
:* [[Linux software installation]]&lt;br /&gt;
&amp;lt;!-- :* [[Notes on using Linux for exercises]] &#039;&#039;&#039;UNDER CONSTRUCTION&#039;&#039;&#039; --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Windows===&lt;br /&gt;
:* [[Windows software installation]]&lt;br /&gt;
&amp;lt;!--:* [[Notes on using Windows for exercises]] &#039;&#039;&#039;UNDER CONSTRUCTION&#039;&#039;&#039; --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===MacOS===&lt;br /&gt;
:* [[MacOS software installation]]&lt;br /&gt;
&amp;lt;!-- :* [[Notes on using MacOS for exercises]] &#039;&#039;&#039;UNDER CONSTRUCTION&#039;&#039;&#039; --&amp;gt;&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
===VirtualBox===&lt;br /&gt;
:* Use this only if you can&#039;t install natively on MacOS, Windows, or Linux. Runs a virtual Linux on top of your own OS.&lt;br /&gt;
:* [[VirtualBox installation]]&lt;br /&gt;
:* [[Notes on using VirtualBox for exercises]]&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== &#039;&#039;&#039;Lecture Schedule&#039;&#039;&#039; ==&lt;br /&gt;
&lt;br /&gt;
:([[27615 Previous course programs|Course programs, previous years]])&lt;br /&gt;
&lt;br /&gt;
===Week 1 (January 31): Introduction to evolutionary theory and population genetics. Models of growth, selection and mutation===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/okjVaLA5S38 Common descent (11:52)]&lt;br /&gt;
:* [https://youtu.be/VkkIu1ZtaIE Natural selection (14:57)]&lt;br /&gt;
:* [https://youtu.be/wqa6W3_WW7s Evidence for evolution (part 1) (9:34)] &lt;br /&gt;
:* [http://y2u.be/_-a-F8egAis Evidence for evolution (part 2) (20:54)]&lt;br /&gt;
:* [http://y2u.be/AUGbSMWPILE Population growth and selection (18:13)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [https://github.com/agormp/evolintro/blob/main/evolintro.pdf Lecture notes on evolutionary theory and population genetics]&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/slides_week1.pdf Slides, week 1]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Population Growth, Fitness, and Selection]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 2 (February 7): Neutral mutations and genetic drift. Tree reconstruction by parsimony===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/cQVjL50kK0k Neutral Theory of Molecular Evolution (11:28)]&lt;br /&gt;
:* [https://youtu.be/J8LDUFm4ttA Genetic Drift (9:35)]&lt;br /&gt;
:* [https://youtu.be/AZkHWdl2oAw Trees: Terminology and Representation (9:41)]&lt;br /&gt;
:* [https://youtu.be/zCj1s9fmaKs Homology and Homoplasy (9:07)]&lt;br /&gt;
:* [https://youtu.be/gXb_WuLCD8g Maximum Parsimony (7:48)]&lt;br /&gt;
:* [https://youtu.be/Q7ZpdPCx0uQ The Fitch Algorithm (10:31)]&lt;br /&gt;
:* [https://youtu.be/deywW9wJXmw Searching Tree Space (14:01)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/slides_week2.pdf Slides, week 2]&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/Paup_Doc_31.pdf PAUP 3.1 manual (note: for older version - contains explanations of parsimony and tree moves)]&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/PAUP4-manual.pdf PAUP 4beta command reference]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Phylogenetic Analysis using Parsimony]]&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 3 (February 14): Consensus trees. Distance matrix methods===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://www.youtube.com/watch?v=YXZZyu9OAcg Consensus Trees (16:27)]&lt;br /&gt;
:* [https://www.youtube.com/watch?v=MhjSSxcGjaY Distance Matrix Methods, part 1 (6:07)]&lt;br /&gt;
:* [https://www.youtube.com/watch?v=PNoUcQTCxiM Distance Matrix Methods, part 2 (22:28)]&lt;br /&gt;
:* [https://www.youtube.com/watch?v=Dj24mCLQYUE Neighbour Joining (15:28)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Consensus.pdf|Handout exercise: Consensus Trees]]&lt;br /&gt;
:* [[Media:Distance handout.pdf|Handout exercise: Distance Matrix Methods]]&lt;br /&gt;
:* [[Media:Slides week3.pdf|Slides, week 3]]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercises&lt;br /&gt;
:* [[Consensus Trees]] &lt;br /&gt;
:* [[Distance Matrix Methods]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 4+5 (February 21 + 28): Mini project 1===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;#--&lt;br /&gt;
&lt;br /&gt;
Project description: [[Media:Miniproject1 whales.pdf|Building a tree from scratch: What are the closest relatives of whales?]]&lt;br /&gt;
&lt;br /&gt;
The mini project should be submitted and assessed via a peer assessment module that will become available on the course DTU Learn page.&lt;br /&gt;
&lt;br /&gt;
Take this tree quiz to  test yourself on your ability to accurately interpret evolutionary trees: &lt;br /&gt;
* [[Media:Treequiz1.pdf|Tree quiz]]&lt;br /&gt;
Check your replies here:  &lt;br /&gt;
* [[Media:Treequiz1 answers.pdf|Tree quiz with answers]] &lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 6 (March 6): Models of sequence evolution. Likelihood methods===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/ro2MFmVZypQ Models of evolution (28:48)]&lt;br /&gt;
:* [https://youtu.be/xDKUIegYpWM Maximum likelihood (22:06)]&lt;br /&gt;
:* [https://youtu.be/Siau2o_egGI Ancestral reconstruction (10:45)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Handout real exp change.pdf|Handout exercise: Real, Observed, and Expected Change]]&lt;br /&gt;
:* [[Media:Handout likelihood.pdf|Handout exercise: Computation of Likelihood]]&lt;br /&gt;
:* [[Media:Slides week4.pdf|Slides, week 6]]&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/substitutionmodels.pdf Lecture notes: Substitution models]&lt;br /&gt;
:* [https://teaching.healthtech.dtu.dk/material/22115/main.pdf Optional lecture notes: Matrix exponentials for Markov chains]&lt;br /&gt;
:; Computer exercises&lt;br /&gt;
:* [[Models of Evolution]]&lt;br /&gt;
:* [[Maximum Likelihood]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 7 (March 13): Bayesian inference of phylogeny===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://www.youtube.com/watch?v=DI3TIx78NqM&amp;amp;t=12s Bayesian Inference (23:48)]&lt;br /&gt;
:* [https://youtu.be/uyG5DVigEyM?list=PLXwjzs_mabFrlRF7uALEomfGGckG0sG5y Markov chain Monte Carlo (19:54)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Handout.class08.pdf|Handout exercise: Bayesian estimation of model parameter value]]&lt;br /&gt;
:* [[Media:Slides week5.pdf|Slides, week 7]]&lt;br /&gt;
:* [[Media:MTN122.pdf| An Introduction to Bayesian Statistics Without Using Equations]]&lt;br /&gt;
:* [http://www.nature.com/nbt/journal/v22/n9/pdf/nbt0904-1177.pdf Background reading: &amp;quot;What is Bayesian statistics?&amp;quot;]&lt;br /&gt;
:* [http://rsta.royalsocietypublishing.org/content/roypta/361/1813/2681.full.pdf Background reading: &amp;quot;Bayesian computation: a statistical revolution&amp;quot;]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Bayesian Phylogeny]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 8+9 (March 20 + April 3): Mini project 2===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&#039;&#039;&#039;Maximum likelihood and R-based phylogenetics - The origin of the Lassa and SARS-CoV-2 virus epidemics&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Project description and data sets&#039;&#039;&#039;: See DTU Learn page &lt;br /&gt;
&lt;br /&gt;
The mini project should be submitted and assessed via the peergrade interface at DTU Learn.&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 10 (April 10): Model Selection===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/sJB2LmppZj8?list=PLXwjzs_mabFrlRF7uALEomfGGckG0sG5y Model selection, part 1 (15:19)]&lt;br /&gt;
:* [https://youtu.be/qSoDZ_33GbM Model selection, part 2 (17:20)]&lt;br /&gt;
:* [https://youtu.be/YYoo1vUO4ME Introduction to computer exercise: detection of selection (15:24)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Slides week6.pdf|Slides, week 10]]&lt;br /&gt;
:* [https://github.com/ddarriba/jmodeltest2/files/157130/manual.pdf jmodeltest manual]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Model selection]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 11 (April 17): Bayesian Phylogenetics, Part 2 ===&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [https://www.researchgate.net/publication/319965471_A_biologist%27s_guide_to_Bayesian_phylogenetic_analysis A biologist’s guide to Bayesian phylogenetic analysis]&lt;br /&gt;
:* [https://beast.community/analysing_beast_output Analysing BEAST output using Tracer]&lt;br /&gt;
:* [https://beast.community/tracer_convergence Identifying convergence problems using Tracer]&lt;br /&gt;
:* [https://taming-the-beast.org/tutorials/Troubleshooting/ Post-processing and improving performance]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Bayesian phylogenetics: checking convergence]] &lt;br /&gt;
:* [[Bayesian phylogenetics: clock models]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 12 + 13 (April 24 + May 1): Mini project 3: Final exam===&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Details will follow&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
----&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Computational_Molecular_Evolution_22115_-_2021&amp;diff=21</id>
		<title>Computational Molecular Evolution 22115 - 2021</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Computational_Molecular_Evolution_22115_-_2021&amp;diff=21"/>
		<updated>2024-03-19T13:17:38Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;; Overview  450px : The main goal of this course is to give an introduction to theory and algorithms in the field of computational molecular evolution. We will cover basic evolutionary theory (common descent, natural selection, genetic drift, models of growth and selection), and the main types of algorithms used for constructing and analyzing phylogenetic trees (parsimony, distance based methods, maximum likelihood methods,...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;; Overview  [[File:Darwin logo2 medium.png |right|border|450px]]&lt;br /&gt;
: The main goal of this course is to give an introduction to theory and algorithms in the field of computational molecular evolution. We will cover basic evolutionary theory (common descent, natural selection, genetic drift, models of growth and selection), and the main types of algorithms used for constructing and analyzing phylogenetic trees (parsimony, distance based methods, maximum likelihood methods, and Bayesian inference). We will also discuss the role of statistical modeling in science more generally&lt;br /&gt;
&lt;br /&gt;
:The course will consist of lectures, computer exercises, and micro-projects. The student will acquire practical experience in the use of a range of computational methods by analyzing sequences from the scientific literature.&lt;br /&gt;
&lt;br /&gt;
;Computer setup:&lt;br /&gt;
:In this course we will use software running on a Linux platform. You can do this by installing Oracle VirtualBox and then using the virtual disk image we have prepared for you (see links below). The virtual disk image contains both a pre-installed Linux operating system and all the software you will need to do the weekly computer exercises. If you are already running Linux (or a Linux-like operating system like MacOS) you may want to experiment with directly installing on your own computer (see instructions below), but then you have to sort out the installation issues yourself (the alternative is to install the virtual Linux on top of your own Linux, which also works).&lt;br /&gt;
&lt;br /&gt;
:*[https://youtu.be/xaKQDN2PMtc &#039;&#039;&#039;Quick-start (video)&#039;&#039;&#039;]: How to install and use the virtual machine (shown on Mac OSX, but other platforms will be very similar). (Note: instruction video is for Coursera version of course, but it should be simple to extrapolate to your own situation).&lt;br /&gt;
:* [https://files.dtu.dk/u/nnYXkqf2F4yMdheQ/MolEvol_2021.zip?l MolEvol_2021.zip]: Virtual Disk Image file (compressed) containing pre-installed Linux (Linux Lite) operating system and all software needed for this course.&lt;br /&gt;
:** Compressed file size: 4.5 GB&lt;br /&gt;
:** Full file size: 12 GB (maximum size: 30 GB, dynamically allocated, so size only increases as needed.)&lt;br /&gt;
:** Linux distribution used here is Linux Lite. Should be fairly simple to use (note the app start window in lower left corner, which works much like on Windows)&lt;br /&gt;
:** List of software used (in case you want to install on own operating system): [[Software installation instructions]]&lt;br /&gt;
:*[https://www.virtualbox.org/wiki/Downloads &#039;&#039;&#039;Oracle VirtualBox&#039;&#039;&#039;]: Download and install the version for your operating system. Allows use of guest operating system (Linux) on top of your main operating system (typically Windows, Mac OSX, or Linux).&lt;br /&gt;
&lt;br /&gt;
:The default user-ID and password on the virtual machine: user-ID = student, password = 1234&lt;br /&gt;
&lt;br /&gt;
== &#039;&#039;&#039;Lecture Schedule&#039;&#039;&#039; ==&lt;br /&gt;
&lt;br /&gt;
:([[27615 Previous course programs|Course programs, previous years]])&lt;br /&gt;
&lt;br /&gt;
===Week 1 (February 3): Introduction to evolutionary theory and population genetics. Models of growth, selection and mutation===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/okjVaLA5S38 Common descent (11:52)]&lt;br /&gt;
:* [https://youtu.be/VkkIu1ZtaIE Natural selection (14:57)]&lt;br /&gt;
:* [https://youtu.be/wqa6W3_WW7s Evidence for evolution (part 1) (9:34)] &lt;br /&gt;
:* [http://y2u.be/_-a-F8egAis Evidence for evolution (part 2) (20:54)]&lt;br /&gt;
:* [http://y2u.be/AUGbSMWPILE Population growth and selection (18:13)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Lecturenotebook small.pdf|Lecture notes]]&lt;br /&gt;
:* [http://teaching.healthtech.dtu.dk/material/36615/slides_week1.pdf Slides, week 1]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Population Growth, Fitness, and Selection]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 2 (February 10): Neutral mutations and genetic drift. Tree reconstruction by parsimony===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/cQVjL50kK0k Neutral Theory of Molecular Evolution (11:28)]&lt;br /&gt;
:* [https://youtu.be/J8LDUFm4ttA Genetic Drift (9:35)]&lt;br /&gt;
:* [https://youtu.be/AZkHWdl2oAw Trees: Terminology and Representation (9:41)]&lt;br /&gt;
:* [https://youtu.be/zCj1s9fmaKs Homology and Homoplasy (9:07)]&lt;br /&gt;
:* [https://youtu.be/gXb_WuLCD8g Maximum Parsimony (7:48)]&lt;br /&gt;
:* [https://youtu.be/Q7ZpdPCx0uQ The Fitch Algorithm (10:31)]&lt;br /&gt;
:* [https://youtu.be/deywW9wJXmw Searching Tree Space (14:01)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [http://teaching.healthtech.dtu.dk/material/36615/slides_week2.pdf Slides, week 2]&lt;br /&gt;
:* [http://teaching.healthtech.dtu.dk/material/36615/Paup_Doc_31.pdf PAUP 3.1 manual (note: for older version - contains explanations of parsimony and tree moves)]&lt;br /&gt;
:* [http://teaching.healthtech.dtu.dk/material/36615/PAUP4-manual.pdf PAUP 4beta command reference]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Phylogenetic Analysis using Parsimony]]&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 3+4 (February 17+24): Mini project 1===&lt;br /&gt;
&lt;br /&gt;
Project description: [[Media:Miniproject1 whales.pdf|Building a tree from scratch: What are the closest relatives of whales?]]&lt;br /&gt;
&lt;br /&gt;
The mini project should be submitted and assessed via a peer assessment module that will become available on the course DTU Learn page.&lt;br /&gt;
&lt;br /&gt;
Take this tree quiz to  test yourself on your ability to accurately interpret evolutionary trees: &lt;br /&gt;
* [[Media:Treequiz1.pdf|Tree quiz]]&lt;br /&gt;
Check your replies here:  &lt;br /&gt;
* [[Media:Treequiz1 answers.pdf|Tree quiz with answers]] &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 5 (March 3): Consensus trees. Distance matrix methods===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://www.youtube.com/watch?v=YXZZyu9OAcg Consensus Trees (16:27)]&lt;br /&gt;
:* [https://www.youtube.com/watch?v=MhjSSxcGjaY Distance Matrix Methods, part 1 (6:07)]&lt;br /&gt;
:* [https://www.youtube.com/watch?v=PNoUcQTCxiM Distance Matrix Methods, part 2 (22:28)]&lt;br /&gt;
:* [https://www.youtube.com/watch?v=Dj24mCLQYUE Neighbour Joining (15:28)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Consensus.pdf|Handout exercise: Consensus Trees]]&lt;br /&gt;
:* [[Media:Distance handout.pdf|Handout exercise: Distance Matrix Methods]]&lt;br /&gt;
:* [[Media:Slides week3.pdf|Slides, week 5]]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercises&lt;br /&gt;
:* [[Consensus Trees]] &lt;br /&gt;
:* [[Distance Matrix Methods]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 6 (March 10): Models of sequence evolution. Likelihood methods===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/ro2MFmVZypQ Models of evolution (28:48)]&lt;br /&gt;
:* [https://youtu.be/xDKUIegYpWM Maximum likelihood (22:06)]&lt;br /&gt;
:* [https://youtu.be/Siau2o_egGI Ancestral reconstruction (10:45)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Handout real exp change.pdf|Handout exercise: Real, Observed, and Expected Change]]&lt;br /&gt;
:* [[Media:Handout likelihood.pdf|Handout exercise: Computation of Likelihood]]&lt;br /&gt;
:* [[Media:Slides week4.pdf|Slides, week 6]]&lt;br /&gt;
:* [http://teaching.bioinformatics.dtu.dk/material/36615/substitutionmodels.pdf Lecture notes: Substitution models]&lt;br /&gt;
:* [http://teaching.bioinformatics.dtu.dk/material/36615/main.pdf Optional lecture notes: Matrix exponentials for Markov chains]&lt;br /&gt;
:; Computer exercises&lt;br /&gt;
:* [[Models of Evolution]]&lt;br /&gt;
:* [[Maximum Likelihood]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 7+8 (March 17 + 24): Mini project 2===&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Maximum likelihood and R-based phylogenetics - The origin of the Lassa and SARS-CoV-2 virus epidemics&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Lassa data set&#039;&#039;&#039;: [https://files.dtu.dk/u/stV-fYtVfmeg1X5_/lassa.nexus?l lassa.nexus]&lt;br /&gt;
: Alignment of Lassa virus sequences encoding the transmembrane glycoprotein complex (GPC). GPC is important in the initial contact of the virus with the cells it infects and also its diffusion into the host cell. The 35 sequences include both rodent and human sequences, from a range of years, and a range of geographic locations. Names indicate location (Nig = Nigeria, Sier = Sierra Leone, Ivory = Ivory Coast, Lib = Liberia), year sampled, and host species (homo = human, nat = Mastomys natalensis, a rodent). You should use the &amp;quot;pinneo&amp;quot; strain from 1969 to root the tree (the &amp;quot;Pinneo&amp;quot; or &amp;quot;LP&amp;quot; strain of Lassa virus was isolated from [https://www.astmh.org/blog/october-2012/astmh-remembers-penny-pinneo,-a-pioneer-in-combati the blood of Penny Pinneo], a Pioneer in Combating Lassa Fever, after a severe hemorrhagic illness acquired in Nigeria in 1969).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;SARS-CoV-2 data set&#039;&#039;&#039;: See instructions in project description.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Project description&#039;&#039;&#039;: [https://files.dtu.dk/u/znAAkPfwAIjccACS/Miniproject2_corona.pdf?l Miniproject2_corona.pdf]&lt;br /&gt;
&lt;br /&gt;
The mini project should be submitted and assessed via the peergrade interface at DTU Learn.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 9 (April 7): Bayesian inference of phylogeny===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://www.youtube.com/watch?v=DI3TIx78NqM&amp;amp;t=12s Bayesian Inference (23:48)]&lt;br /&gt;
:* [https://youtu.be/uyG5DVigEyM?list=PLXwjzs_mabFrlRF7uALEomfGGckG0sG5y Markov chain Monte Carlo (19:54)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Handout.class08.pdf|Handout exercise: Bayesian estimation of model parameter value]]&lt;br /&gt;
:* [[Media:Slides week5.pdf|Slides, week 9]]&lt;br /&gt;
:* [[Media:MTN122.pdf| An Introduction to Bayesian Statistics Without Using Equations]]&lt;br /&gt;
:* [http://www.nature.com/nbt/journal/v22/n9/pdf/nbt0904-1177.pdf Background reading: &amp;quot;What is Bayesian statistics?&amp;quot;]&lt;br /&gt;
:* [http://rsta.royalsocietypublishing.org/content/roypta/361/1813/2681.full.pdf Background reading: &amp;quot;Bayesian computation: a statistical revolution&amp;quot;]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Bayesian Phylogeny]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 10 (April 14): Model Selection===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/sJB2LmppZj8?list=PLXwjzs_mabFrlRF7uALEomfGGckG0sG5y Model selection, part 1 (15:19)]&lt;br /&gt;
:* [https://youtu.be/qSoDZ_33GbM Model selection, part 2 (17:20)]&lt;br /&gt;
:* [https://youtu.be/YYoo1vUO4ME Introduction to computer exercise: detection of selection (15:24)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Slides week6.pdf|Slides, week 10]]&lt;br /&gt;
:* [https://github.com/ddarriba/jmodeltest2/files/157130/manual.pdf jmodeltest manual]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Model selection]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 11 (April 21): Bayesian Phylogenetics, Part 2 ===&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [https://www.researchgate.net/publication/319965471_A_biologist%27s_guide_to_Bayesian_phylogenetic_analysis A biologist’s guide to Bayesian phylogenetic analysis]&lt;br /&gt;
:* [https://beast.community/analysing_beast_output Analysing BEAST output using Tracer]&lt;br /&gt;
:* [https://beast.community/tracer_convergence Identifying convergence problems using Tracer]&lt;br /&gt;
:* [https://taming-the-beast.org/tutorials/Troubleshooting/ Post-processing and improving performance]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Bayesian phylogenetics: checking convergence]] &lt;br /&gt;
:* [[Bayesian phylogenetics: clock models]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 12 + 13 (April 28 + May 5): Mini project 3===&lt;br /&gt;
&#039;&#039;&#039;Bayesian and likelihood-based phylogenetics.   SARS-CoV-2: selection and clock models&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Project description&#039;&#039;&#039;: [http://teaching.bioinformatics.dtu.dk/material/36615/Miniproject3_corona.pdf Miniproject3_corona.pdf]&lt;br /&gt;
&lt;br /&gt;
The mini project should be submitted and assessed via the peergrade interface.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==Old exam sets==&lt;br /&gt;
&lt;br /&gt;
* [http://wiki.bio.dtu.dk/teaching/index.php/27615-2011 Mini project exam]&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam_2004.pdf Exam 2004] ([http://www.cbs.dtu.dk/courses/27615/exam_examples/exam_2004_withanswers.pdf Answers])&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam2005.pdf Exam 2005]&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam2006.pdf Exam 2006]&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam2007.pdf Exam 2007]&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam2008.pdf Exam 2008]&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam2009.pdf Exam 2009] ([[Media:Exam2009 answers.pdf|Answers]])&lt;br /&gt;
* [[Media:Exam2015.pdf|Exam 2015]] ([[Media:Exam2015 answers.pdf|Answers]])&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Computational_Molecular_Evolution_22115_-_2020&amp;diff=20</id>
		<title>Computational Molecular Evolution 22115 - 2020</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Computational_Molecular_Evolution_22115_-_2020&amp;diff=20"/>
		<updated>2024-03-19T13:16:51Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot; ===Week 1 (February 5): Introduction to evolutionary theory and population genetics. Models of growth, selection and mutation===  :; Online lectures :* [https://youtu.be/okjVaLA5S38 Common descent (11:52)] :* [https://youtu.be/VkkIu1ZtaIE Natural selection (14:57)] :* [https://youtu.be/wqa6W3_WW7s Evidence for evolution (part 1) (9:34)]  :* [http://y2u.be/_-a-F8egAis Evidence for evolution (part 2) (20:54)] :* [http://y2u.be/AUGbSMWPILE Population growth and selection (...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
===Week 1 (February 5): Introduction to evolutionary theory and population genetics. Models of growth, selection and mutation===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/okjVaLA5S38 Common descent (11:52)]&lt;br /&gt;
:* [https://youtu.be/VkkIu1ZtaIE Natural selection (14:57)]&lt;br /&gt;
:* [https://youtu.be/wqa6W3_WW7s Evidence for evolution (part 1) (9:34)] &lt;br /&gt;
:* [http://y2u.be/_-a-F8egAis Evidence for evolution (part 2) (20:54)]&lt;br /&gt;
:* [http://y2u.be/AUGbSMWPILE Population growth and selection (18:13)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Lecturenotebook small.pdf|Lecture notes]]&lt;br /&gt;
:* [http://teaching.healthtech.dtu.dk/material/36615/slides_week1.pdf Slides, week 1]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Population Growth, Fitness, and Selection]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 2 (February 12): Neutral mutations and genetic drift. Tree reconstruction by parsimony===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/cQVjL50kK0k Neutral Theory of Molecular Evolution (11:28)]&lt;br /&gt;
:* [https://youtu.be/J8LDUFm4ttA Genetic Drift (9:35)]&lt;br /&gt;
:* [https://youtu.be/AZkHWdl2oAw Trees: Terminology and Representation (9:41)]&lt;br /&gt;
:* [https://youtu.be/zCj1s9fmaKs Homology and Homoplasy (9:07)]&lt;br /&gt;
:* [https://youtu.be/gXb_WuLCD8g Maximum Parsimony (7:48)]&lt;br /&gt;
:* [https://youtu.be/Q7ZpdPCx0uQ The Fitch Algorithm (10:31)]&lt;br /&gt;
:* [https://youtu.be/deywW9wJXmw Searching Tree Space (14:01)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [http://teaching.healthtech.dtu.dk/material/36615/slides_week2.pdf Slides, week 2]&lt;br /&gt;
:* [http://teaching.healthtech.dtu.dk/material/36615/Paup_Doc_31.pdf PAUP 3.1 manual (note: for older version - contains explanations of parsimony and tree moves)]&lt;br /&gt;
:* [http://teaching.healthtech.dtu.dk/material/36615/PAUP4-manual.pdf PAUP 4beta command reference]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Phylogenetic Analysis using Parsimony]]&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 3+4 (February 19+26): Mini project 1===&lt;br /&gt;
&lt;br /&gt;
Building a tree from scratch: What are the closest relatives of whales?&lt;br /&gt;
&lt;br /&gt;
The mini project should be submitted and assessed via the peergrade.io interface. &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 5 (March 4): Consensus trees. Distance matrix methods===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://www.youtube.com/watch?v=YXZZyu9OAcg Consensus Trees (16:27)]&lt;br /&gt;
:* [https://www.youtube.com/watch?v=MhjSSxcGjaY Distance Matrix Methods, part 1 (6:07)]&lt;br /&gt;
:* [https://www.youtube.com/watch?v=PNoUcQTCxiM Distance Matrix Methods, part 2 (22:28)]&lt;br /&gt;
:* [https://www.youtube.com/watch?v=Dj24mCLQYUE Neighbour Joining (15:28)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Consensus.pdf|Handout exercise: Consensus Trees]]&lt;br /&gt;
:* [[Media:Distance handout.pdf|Handout exercise: Distance Matrix Methods]]&lt;br /&gt;
:* [[Media:Slides week3.pdf|Slides, week 5]]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercises&lt;br /&gt;
:* [[Consensus Trees]] &lt;br /&gt;
:* [[Distance Matrix Methods]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 6 (March 11): Models of sequence evolution. Likelihood methods===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/ro2MFmVZypQ Models of evolution (28:48)]&lt;br /&gt;
:* [https://youtu.be/xDKUIegYpWM Maximum likelihood (22:06)]&lt;br /&gt;
:* [https://youtu.be/Siau2o_egGI Ancestral reconstruction (10:45)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Handout real exp change.pdf|Handout exercise: Real, Observed, and Expected Change]]&lt;br /&gt;
:* [[Media:Handout likelihood.pdf|Handout exercise: Computation of Likelihood]]&lt;br /&gt;
:* [[Media:Slides week4.pdf|Slides, week 6]]&lt;br /&gt;
:* [http://teaching.bioinformatics.dtu.dk/material/36615/substitutionmodels.pdf Lecture notes: Substitution models]&lt;br /&gt;
:* [http://teaching.bioinformatics.dtu.dk/material/36615/main.pdf Optional lecture notes: Matrix exponentials for Markov chains]&lt;br /&gt;
:; Computer exercises&lt;br /&gt;
:* [[Models of Evolution]]&lt;br /&gt;
:* [[Maximum Likelihood]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 7+8 (March 18 + 25): Mini project 2===&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Maximum likelihood and R-based phylogenetics - The origin of the Lassa and SARS-CoV-2 virus epidemics&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Lassa data set&#039;&#039;&#039;: [https://files.dtu.dk/u/zcmni_RJp4PfyaeJ/lassa.nexus?l lassa.nexus]&lt;br /&gt;
: Alignment of Lassa virus sequences encoding the transmembrane glycoprotein complex (GPC). GPC is important in the initial contact of the virus with the cells it infects and also its diffusion into the host cell. The 35 sequences include both rodent and human sequences, from a range of years, and a range of geographic locations. Names indicate location (Nig = Nigeria, Sier = Sierra Leone, Ivory = Ivory Coast, Lib = Liberia), year sampled, and host species (homo = human, nat = Mastomys natalensis, a rodent). You should use the &amp;quot;pinneo&amp;quot; strain from 1969 to root the tree (the &amp;quot;Pinneo&amp;quot; or &amp;quot;LP&amp;quot; strain of Lassa virus was isolated from [https://www.astmh.org/blog/october-2012/astmh-remembers-penny-pinneo,-a-pioneer-in-combati the blood of Penny Pinneo], a Pioneer in Combating Lassa Fever, after a severe hemorrhagic illness acquired in Nigeria in 1969).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;SARS-CoV-2 data set&#039;&#039;&#039;: See instructions in project description.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Project description&#039;&#039;&#039;: [https://files.dtu.dk/u/zusDAfdrAwzc6wMD/Miniproject2_corona.pdf?l Miniproject2_corona.pdf]&lt;br /&gt;
&lt;br /&gt;
The mini project should be submitted and assessed via the peergrade.io interface.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 9 (April 1): Bayesian inference of phylogeny===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://www.youtube.com/watch?v=DI3TIx78NqM&amp;amp;t=12s Bayesian Inference (23:48)]&lt;br /&gt;
:* [https://youtu.be/uyG5DVigEyM?list=PLXwjzs_mabFrlRF7uALEomfGGckG0sG5y Markov chain Monte Carlo (19:54)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Handout.class08.pdf|Handout exercise: Bayesian estimation of model parameter value]]&lt;br /&gt;
:* [[Media:Slides week5.pdf|Slides, week 9]]&lt;br /&gt;
:* [[Media:MTN122.pdf| An Introduction to Bayesian Statistics Without Using Equations]]&lt;br /&gt;
:* [http://www.nature.com/nbt/journal/v22/n9/pdf/nbt0904-1177.pdf Background reading: &amp;quot;What is Bayesian statistics?&amp;quot;]&lt;br /&gt;
:* [http://rsta.royalsocietypublishing.org/content/roypta/361/1813/2681.full.pdf Background reading: &amp;quot;Bayesian computation: a statistical revolution&amp;quot;]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Bayesian Phylogeny]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 10 (April 15): Model Selection===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/sJB2LmppZj8?list=PLXwjzs_mabFrlRF7uALEomfGGckG0sG5y Model selection, part 1 (15:19)]&lt;br /&gt;
:* [https://youtu.be/qSoDZ_33GbM Model selection, part 2 (17:20)]&lt;br /&gt;
:* [https://youtu.be/YYoo1vUO4ME Introduction to computer exercise: detection of selection (15:24)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Slides week6.pdf|Slides, week 10]]&lt;br /&gt;
:* [https://github.com/ddarriba/jmodeltest2/files/157130/manual.pdf jmodeltest manual]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Model selection]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 11 (April 22): Bayesian Phylogenetics, Part 2 ===&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [https://files.dtu.dk/u/mJBf7aEQX_6KkF_v/bayesianphylogeny.pdf?l A biologist’s guide to Bayesian phylogenetic analysis]&lt;br /&gt;
:* [https://beast.community/analysing_beast_output Analysing BEAST output using Tracer]&lt;br /&gt;
:* [https://beast.community/tracer_convergence Identifying convergence problems using Tracer]&lt;br /&gt;
:* [https://taming-the-beast.org/tutorials/Troubleshooting/ Post-processing and improving performance]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Bayesian phylogenetics: detection of positively selected sites]] &lt;br /&gt;
:* [[Bayesian phylogenetics: clock models]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 12 + 13 (April 29 + May 6): Mini project 3===&lt;br /&gt;
&#039;&#039;&#039;Bayesian and likelihood-based phylogenetics.   SARS-CoV-2: selection and clock models&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Project description&#039;&#039;&#039;: [https://files.dtu.dk/u/x2OrIP7jqzDVY6si/Miniproject3_corona.pdf?l Miniproject3_corona.pdf]&lt;br /&gt;
&lt;br /&gt;
The mini project should be submitted and assessed via the peergrade.io interface.&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
==Old exam sets==&lt;br /&gt;
&lt;br /&gt;
* [http://wiki.bio.dtu.dk/teaching/index.php/27615-2011 Mini project exam]&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam_2004.pdf Exam 2004] ([http://www.cbs.dtu.dk/courses/27615/exam_examples/exam_2004_withanswers.pdf Answers])&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam2005.pdf Exam 2005]&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam2006.pdf Exam 2006]&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam2007.pdf Exam 2007]&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam2008.pdf Exam 2008]&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam2009.pdf Exam 2009] ([[Media:Exam2009 answers.pdf|Answers]])&lt;br /&gt;
* [[Media:Exam2015.pdf|Exam 2015]] ([[Media:Exam2015 answers.pdf|Answers]])&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Molecular_Evolution_27615_-_2016&amp;diff=19</id>
		<title>Molecular Evolution 27615 - 2016</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Molecular_Evolution_27615_-_2016&amp;diff=19"/>
		<updated>2024-03-19T13:15:59Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;== &amp;#039;&amp;#039;&amp;#039;Lecture Schedule 2016&amp;#039;&amp;#039;&amp;#039; ==  ===Week 1 (February 3): Introduction to evolutionary theory and population genetics. Models of growth, selection and mutation===  :; Online lectures :* [https://class.coursera.org/molevol-003/lecture/11 Common descent (11:52)] :* [https://class.coursera.org/molevol-003/lecture/15 Natural selection (14:57)] :* [https://class.coursera.org/molevol-003/lecture/17 Evidence for evolution (part 1) (9:34)]  :* [https://class.coursera.org/molevo...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== &#039;&#039;&#039;Lecture Schedule 2016&#039;&#039;&#039; ==&lt;br /&gt;
&lt;br /&gt;
===Week 1 (February 3): Introduction to evolutionary theory and population genetics. Models of growth, selection and mutation===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/11 Common descent (11:52)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/15 Natural selection (14:57)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/17 Evidence for evolution (part 1) (9:34)] &lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/19 Evidence for evolution (part 2) (20:54)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/35 Population growth and selection (18:13)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [http://wiki.bio.dtu.dk/~agpe/course_material/27615/PDFs/lecturenotebook.pdf Lecture notes]&lt;br /&gt;
:* [http://wiki.bio.dtu.dk/~agpe/course_material/27615/PDFs/slides_week1.pdf Slides, week 1]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [http://www.cbs.dtu.dk/courses/27615/exercises/simulation1_courseraversion.php Population Growth, Fitness, and Selection]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 2 (February 10): Neutral mutations and genetic drift. Tree reconstruction by parsimony===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/21 Neutral Theory of Molecular Evolution (11:28)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/23 Genetic Drift (9:35)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/25 Trees: Terminology and Representation (9:41)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/27 Homology and Homoplasy (9:07)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/29 Maximum Parsimony (7:48)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/31 The Fitch Algorithm (10:31)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/33 Searching Tree Space (14:01)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [http://wiki.bio.dtu.dk/~agpe/course_material/27615/PDFs/slides_week2.pdf Slides, week 2]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [http://www.cbs.dtu.dk/courses/27615/exercises/parsimony1_courseraversion.php Phylogenetic Analysis using Parsimony]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 3+4 (February 17+24): Mini project 1===&lt;br /&gt;
&lt;br /&gt;
Building a tree from scratch: What are the closest relatives of whales?&lt;br /&gt;
&lt;br /&gt;
The mini project should be submitted and assessed via the [https://www.peergrade.io/login peergrade.io] interface. &lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 5 (March 2): Consensus trees. Distance matrix methods===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/39 Consensus Trees (16:27)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/41 Distance Matrix Methods, part 1 (6:07)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/43 Distance Matrix Methods, part 2 (22:28)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/45 Neighbour Joining (15:28)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Consensus.pdf|Handout exercise: Consensus Trees]]&lt;br /&gt;
:* [[Media:Distance handout.pdf|Handout exercise: Distance Matrix Methods]]&lt;br /&gt;
:* [[Media:Slides week3.pdf|Slides, week 5]]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercises&lt;br /&gt;
:* [http://www.cbs.dtu.dk/courses/27615/exercises/consensustrees_courseraversion.php Consensus Trees]&lt;br /&gt;
:* [http://www.cbs.dtu.dk/courses/27615/exercises/distance1_courseraversion.php Distance Matrix Methods]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 6 (March 9): Models of sequence evolution. Likelihood methods===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/49 Models of evolution (28:48)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/51 Maximum likelihood (22:06)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/53 Ancestral reconstruction (10:45)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Handout real exp change.pdf|Handout exercise: Real, Observed, and Expected Change]]&lt;br /&gt;
:* [[Media:Handout likelihood.pdf|Handout exercise: Computation of Likelihood]]&lt;br /&gt;
:* [[Media:Slides week4.pdf|Slides, week 6]]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercises&lt;br /&gt;
:* [http://www.cbs.dtu.dk/courses/27615/exercises/distance2_courseraversion.php Models of Evolution]&lt;br /&gt;
:* [http://www.cbs.dtu.dk/courses/27615/exercises/likelihood1_courseraversion.php Maximum Likelihood]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 7+8 (March 16 + 30): Mini project 2===&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 9 (April 6): Bayesian inference of phylogeny===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/55 Bayesian Inference (23:48)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/57 Markov chain Monte Carlo (19:54)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Handout.class08.pdf|Handout exercise: Bayesian estimation of model parameter value]]&lt;br /&gt;
:* [[Media:Slides week5.pdf|Slides, week 9]]&lt;br /&gt;
:* [http://mgel.env.duke.edu/wp-content/publicuploads/eguchi-2008-intro-to-baysian-statistics.pdf Background reading: &amp;quot;An Introduction to Bayesian Statistics Without Using Equations&amp;quot;]&lt;br /&gt;
:* [http://www.nature.com/nbt/journal/v22/n9/pdf/nbt0904-1177.pdf Background reading: &amp;quot;What is Bayesian statistics?&amp;quot;]&lt;br /&gt;
:* [http://rsta.royalsocietypublishing.org/content/roypta/361/1813/2681.full.pdf Background reading: &amp;quot;Bayesian computation: a statistical revolution&amp;quot;]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [http://www.cbs.dtu.dk/courses/27615/exercises/bayes1_courseraversion.php Bayesian Phylogeny]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 10 (April 13): Testing hypotheses in a phylogenetic context===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/59 Model selection, part 1 (15:19)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/61 Model selection, part 2 (17:20)]&lt;br /&gt;
:* [https://class.coursera.org/molevol-003/lecture/63 Introduction to exercise (15:24)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Slides week6.pdf|Slides, week 10]]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [http://www.cbs.dtu.dk/courses/27615/exercises/modelselect_courseraversion.php Model selection]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 11 + 12 (April 20 + 27): Mini project 3===&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 13 (May 4): Mini project 3 finished + questions for exam ===&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Old exam sets===&lt;br /&gt;
&lt;br /&gt;
* [http://wiki.bio.dtu.dk/teaching/index.php/27615-2011 Mini project exam]&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam_2004.pdf Exam 2004] ([http://www.cbs.dtu.dk/courses/27615/exam_examples/exam_2004_withanswers.pdf Answers])&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam2005.pdf Exam 2005]&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam2006.pdf Exam 2006]&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam2007.pdf Exam 2007]&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam2008.pdf Exam 2008]&lt;br /&gt;
* [http://www.cbs.dtu.dk/courses/27615/exam_examples/exam2009.pdf Exam 2009] ([[Media:Exam2009 answers.pdf|Answers]])&lt;br /&gt;
* [[Media:Exam2015.pdf|Exam 2015]] ([[Media:Exam2015 answers.pdf|Answers]])&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=27615_Previous_course_programs&amp;diff=18</id>
		<title>27615 Previous course programs</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=27615_Previous_course_programs&amp;diff=18"/>
		<updated>2024-03-19T13:15:36Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;* Molecular Evolution 27615 - 2016 * Computational Molecular Evolution 22115 - 2020 * Computational Molecular Evolution 22115 - 2021&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* [[Molecular Evolution 27615 - 2016]]&lt;br /&gt;
* [[Computational Molecular Evolution 22115 - 2020]]&lt;br /&gt;
* [[Computational Molecular Evolution 22115 - 2021]]&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Linux_software_installation&amp;diff=17</id>
		<title>Linux software installation</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Linux_software_installation&amp;diff=17"/>
		<updated>2024-03-19T13:14:12Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
These are instructions for how to install software and data used on the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]] when using the Linux operating system.&lt;br /&gt;
&lt;br /&gt;
The commands assume you are using the [https://ubuntu.com/server/docs/package-management apt package manager] used on e.g. Ubuntu Linux.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: If your computer uses a CPU based on the ARM architecture, rather than the more common Intel or AMD (x86_64) architectures, some commands may need to be adjusted. Please let me know if this applies to you, so I can provide additional instructions tailored for ARM-based systems&lt;br /&gt;
&lt;br /&gt;
 # Use the out-commented commands below if you want to copy my premade .bashrc file for customising bash&lt;br /&gt;
 # WARNING: do not owerwrite a pre-existing .bashrc unless you are sure it contains nothing you want to keep&lt;br /&gt;
 # NOTE: if you are using a different shell, then you should use the corresponding .rc file (e.g., .zshrc for zsh)&lt;br /&gt;
 # wget https://teaching.healthtech.dtu.dk/material/22115/bashrc.txt&lt;br /&gt;
 # mv bashrc.txt ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # Nedit&lt;br /&gt;
 sudo apt update&lt;br /&gt;
 sudo apt -y install nedit&lt;br /&gt;
&lt;br /&gt;
 # R, Rstudio&lt;br /&gt;
 sudo apt -y install r-base r-base-dev gdebi-core&lt;br /&gt;
 wget https://download1.rstudio.org/electron/focal/amd64/rstudio-2023.12.1-402-amd64.deb&lt;br /&gt;
 sudo gdebi -n ./rstudio-2023.12.1-402-amd64.deb&lt;br /&gt;
 rm rstudio-2023.12.1-402-amd64.deb&lt;br /&gt;
&lt;br /&gt;
 # Dependencies for R-packages&lt;br /&gt;
 sudo apt -y install libcurl4-openssl-dev libxml2-dev libgit2-dev libopenblas-base&lt;br /&gt;
&lt;br /&gt;
 # MrBayes&lt;br /&gt;
 sudo apt -y install git&lt;br /&gt;
 git clone --depth=1 https://github.com/NBISweden/MrBayes.git ~/MrBayes&lt;br /&gt;
 cd ~/MrBayes&lt;br /&gt;
 ./configure --disable-sse&lt;br /&gt;
 make&lt;br /&gt;
 sudo make install&lt;br /&gt;
 cd ..&lt;br /&gt;
 # Note: above, I am using the flag --disable-sse to avoid crashes on some machines&lt;br /&gt;
 # It is possible that mb will run faster if you omit this flag, so you may want to experiment&lt;br /&gt;
 # with using just &amp;quot;./configure&amp;quot; instead (without the quotes)&lt;br /&gt;
&lt;br /&gt;
 # PAUP&lt;br /&gt;
 wget http://phylosolutions.com/paup-test/paup4a168_ubuntu64.gz&lt;br /&gt;
 gunzip paup4a168_ubuntu64.gz&lt;br /&gt;
 chmod 755 paup4a168_ubuntu64&lt;br /&gt;
 sudo mv paup4a168_ubuntu64 /usr/local/bin/paup&lt;br /&gt;
 sudo apt -y install libpython2.7&lt;br /&gt;
&lt;br /&gt;
 # PAML&lt;br /&gt;
 sudo apt -y install paml&lt;br /&gt;
&lt;br /&gt;
 # jmodeltest&lt;br /&gt;
 wget https://github.com/ddarriba/jmodeltest2/files/157117/jmodeltest-2.1.10.tar.gz&lt;br /&gt;
 sudo tar -xvf jmodeltest-2.1.10.tar.gz --directory /usr/local/src&lt;br /&gt;
 rm jmodeltest-2.1.10.tar.gz&lt;br /&gt;
 echo &amp;quot;alias jmodeltest=&#039;java -jar /usr/local/src/jmodeltest-2.1.10/jModelTest.jar&#039;&amp;quot; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # BEAST2&lt;br /&gt;
 wget https://github.com/CompEvol/beast2/releases/download/v2.7.6/BEAST.v2.7.6.Linux.x86.tgz&lt;br /&gt;
 sudo tar -zxvf BEAST.v2.7.6.Linux.x86.tgz --directory /usr/local/src&lt;br /&gt;
 echo &amp;quot;alias beauti=&#039;/usr/local/src/beast/bin/beauti &amp;gt; /dev/null 2&amp;gt; /dev/null&#039;&amp;quot; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
 echo &#039;PATH=&amp;quot;/usr/local/src/beast/bin${PATH:+:${PATH}}&amp;quot;&#039; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # FigTree&lt;br /&gt;
 sudo apt -y install figtree&lt;br /&gt;
&lt;br /&gt;
 # Tracer&lt;br /&gt;
 wget https://github.com/beast-dev/tracer/releases/download/v1.7.2/Tracer_v1.7.2.tgz&lt;br /&gt;
 sudo mkdir /usr/local/src/Tracer&lt;br /&gt;
 sudo tar -zxf Tracer_v1.7.2.tgz --directory /usr/local/src/Tracer&lt;br /&gt;
 sudo ln -s /usr/local/src/Tracer/bin/tracer /usr/local/bin/&lt;br /&gt;
 rm Tracer_v1.7.2.tgz&lt;br /&gt;
&lt;br /&gt;
 # MAFFT&lt;br /&gt;
 sudo apt -y install mafft&lt;br /&gt;
&lt;br /&gt;
 # Aliview&lt;br /&gt;
 wget https://ormbunkar.se/aliview/downloads/linux/linux-version-1.28/aliview.install.run&lt;br /&gt;
 chmod 755 aliview.install.run&lt;br /&gt;
 sudo ./aliview.install.run&lt;br /&gt;
 rm aliview.install.run&lt;br /&gt;
&lt;br /&gt;
 # seqconverter, sequencelib, phylotreelib&lt;br /&gt;
 # Anders Gorm scripts and libraries:&lt;br /&gt;
 # https://github.com/agormp/phylotreelib&lt;br /&gt;
 # https://github.com/agormp/seqconverter&lt;br /&gt;
 sudo apt -y install python3-numpy python3-pip&lt;br /&gt;
 pip3 install seqconverter&lt;br /&gt;
 pip3 install phylotreelib&lt;br /&gt;
 echo &#039;PATH=&amp;quot;~/.local/bin${PATH:+:${PATH}}&amp;quot;&#039; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # maxalign tool (see: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-312)&lt;br /&gt;
 wget https://teaching.healthtech.dtu.dk/material/22115/maxalign.pl&lt;br /&gt;
 chmod 755 maxalign.pl&lt;br /&gt;
 sudo mv maxalign.pl /usr/local/bin&lt;br /&gt;
&lt;br /&gt;
 # Clean up&lt;br /&gt;
 sudo apt autoremove --purge&lt;br /&gt;
 sudo apt clean&lt;br /&gt;
&lt;br /&gt;
 # Activate changes to .bashrc in current shell&lt;br /&gt;
 source ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # Set up molevol directory for course exercises&lt;br /&gt;
 # You can place this directory anywhere you prefer: &lt;br /&gt;
 # Just replace tilde (~) in the command below with path to preferred base directory&lt;br /&gt;
 # (The tilde symbol is short for the user&#039;s home directory)&lt;br /&gt;
 cd ~&lt;br /&gt;
 mkdir molevol&lt;br /&gt;
 wget https://teaching.healthtech.dtu.dk/material/22115/data.tar.gz&lt;br /&gt;
 tar -xvf data.tar.gz --directory molevol&lt;br /&gt;
 rm data.tar.gz&lt;br /&gt;
&lt;br /&gt;
 # R packages (do this inside Rstudio)&lt;br /&gt;
 install.packages(&amp;quot;tidyverse&amp;quot;)&lt;br /&gt;
 install.packages(&amp;quot;bayesplot&amp;quot;)&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=MacOS_software_installation&amp;diff=16</id>
		<title>MacOS software installation</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=MacOS_software_installation&amp;diff=16"/>
		<updated>2024-03-19T13:13:22Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;These are instructions for how to install software and data used on the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]] when using the MacOS operating system.&lt;br /&gt;
&lt;br /&gt;
== Check which shell you are using, and specify .rc file ==&lt;br /&gt;
Depending on what version of MacOS you are using, [https://support.apple.com/en-us/HT208050 your shell is probably either bash or zsh]. This plays a role when you want to set [https://en.wikipedia.org/wiki/Environment_variable environment variables] (including, importantly, [https://www.baeldung.com/linux/path-variable PATH] which tells your computer where to look for executables), define [https://www.tecmint.com/create-alias-in-linux/ aliases], etc: If you are using &amp;lt;code&amp;gt;bash&amp;lt;/code&amp;gt; then that information should be stored in the file &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt;, in the case of zsh, the file should be &amp;lt;code&amp;gt;.zshrc&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Issue the following command to see which one it is:&lt;br /&gt;
&lt;br /&gt;
 echo $SHELL&lt;br /&gt;
&lt;br /&gt;
If you are using zsh:&lt;br /&gt;
 export MYRCFILE=~/.zshrc&lt;br /&gt;
&lt;br /&gt;
If you are using bash:&lt;br /&gt;
 export MYRCFILE=~/.bashrc&lt;br /&gt;
&lt;br /&gt;
This will store the name of your .rc file in the environment variable &amp;lt;code&amp;gt;$MYRCFILE&amp;lt;/code&amp;gt;, which will be used in some commands below. (The environment variable is not stored, so it will only be active in the current terminal session).&lt;br /&gt;
&lt;br /&gt;
== Install required software ==&lt;br /&gt;
&lt;br /&gt;
 # Homebrew&lt;br /&gt;
 xcode-select --install&lt;br /&gt;
 /bin/bash -c &amp;quot;$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
 # Text editor&lt;br /&gt;
 # This replaces &amp;lt;code&amp;gt;nedit&amp;lt;/code&amp;gt; in exercise manuals. &lt;br /&gt;
 # You need a text editor that can read and save plain-text files. &lt;br /&gt;
 # Here we install TextMate, but there are many other options, including built-in TextEdit, and [https://www.barebones.com/products/bbedit/ BBEdit]&lt;br /&gt;
 # To run from command line: Go to Textmate --&amp;gt; Preferences --&amp;gt; Terminal --&amp;gt; install shell support&lt;br /&gt;
 brew install --cask textmate&lt;br /&gt;
&lt;br /&gt;
 # MrBayes&lt;br /&gt;
 brew install mrbayes&lt;br /&gt;
&lt;br /&gt;
 # PAUP&lt;br /&gt;
 wget http://phylosolutions.com/paup-test/paup4a168_osx.gz&lt;br /&gt;
 gunzip paup4a168_osx.gz&lt;br /&gt;
 chmod 755 paup4a168_osx&lt;br /&gt;
 sudo mv paup4a168_osx /usr/local/bin/paup&lt;br /&gt;
&lt;br /&gt;
 # PAML&lt;br /&gt;
 brew install brewsci/bio/paml&lt;br /&gt;
&lt;br /&gt;
 # jmodeltest&lt;br /&gt;
 wget https://github.com/ddarriba/jmodeltest2/files/157117/jmodeltest-2.1.10.tar.gz&lt;br /&gt;
 sudo tar -xvf jmodeltest-2.1.10.tar.gz --directory /Applications&lt;br /&gt;
 rm jmodeltest-2.1.10.tar.gz&lt;br /&gt;
 echo &amp;quot;alias jmodeltest=&#039;java -jar /Applications/jmodeltest-2.1.10/jModelTest.jar &amp;amp;&amp;gt; /dev/null &amp;amp;&#039;&amp;quot; &amp;gt;&amp;gt; $MYRCFILE&lt;br /&gt;
&lt;br /&gt;
 # FigTree&lt;br /&gt;
 brew install brewsci/bio/figtree&lt;br /&gt;
&lt;br /&gt;
 # Tracer&lt;br /&gt;
 wget https://github.com/beast-dev/tracer/releases/download/v1.7.2/Tracer.v1.7.2.dmg&lt;br /&gt;
 hdiutil mount Tracer.v1.7.2.dmg&lt;br /&gt;
 sudo cp -R &amp;quot;/Volumes/Tracer/Tracer v1.7.2.app&amp;quot; /Applications&lt;br /&gt;
 hdiutil unmount /Volumes/Tracer&lt;br /&gt;
 echo &amp;quot;alias tracer=&#039;open /Applications/Tracer\ v1.7.2.app&#039;&amp;quot; &amp;gt;&amp;gt; $MYRCFILE&lt;br /&gt;
&lt;br /&gt;
 # MAFFT&lt;br /&gt;
 brew install mafft&lt;br /&gt;
&lt;br /&gt;
 # Aliview&lt;br /&gt;
 # Note: This only works if java has access to files and folders on the mac&lt;br /&gt;
 wget https://ormbunkar.se/aliview/downloads/mac/AliView-1.28-app.zip&lt;br /&gt;
 unzip AliView-1.28-app.zip&lt;br /&gt;
 sudo mv AliView-1.28/AliView.app /Applications&lt;br /&gt;
 rm -r AliView-1.28&lt;br /&gt;
 echo &#039;aliview() { java -jar /Applications/AliView.app/Contents/Resources/Java/repo/AliView/AliView/1.28/AliView-1.28.jar &amp;quot;$1&amp;quot; &amp;amp;&amp;gt; /dev/null &amp;amp; }&#039; &amp;gt;&amp;gt; $MYRCFILE&lt;br /&gt;
&lt;br /&gt;
 # seqconverter, sequencelib, phylotreelib&lt;br /&gt;
 # These are scripts and libraries written by me: &lt;br /&gt;
 # [https://github.com/agormp/phylotreelib GitHub phylotreelib] [https://github.com/agormp/sequencelib GitHub sequencelib] [https://github.com/agormp/seqconverter GitHub seqconverter]&lt;br /&gt;
 # These instructions assume you already have a working installation of python3&lt;br /&gt;
 python3 -m pip install seqconverter&lt;br /&gt;
 python3 -m pip install phylotreelib&lt;br /&gt;
&lt;br /&gt;
 # maxalign tool&lt;br /&gt;
 # See [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-312 MaxAlign: maximizing usable data in an alignment] for more information.&lt;br /&gt;
 wget https://teaching.healthtech.dtu.dk/material/22115/maxalign.pl&lt;br /&gt;
 chmod 755 maxalign.pl&lt;br /&gt;
 sudo mv maxalign.pl /usr/local/bin&lt;br /&gt;
&lt;br /&gt;
 # Set up directory for course exercises&lt;br /&gt;
 # You can place this directory anywhere you prefer. &lt;br /&gt;
 # Just replace tilde (~) in command below with path to preferred base directory&lt;br /&gt;
 cd ~&lt;br /&gt;
 mkdir molevol&lt;br /&gt;
 wget https://teaching.healthtech.dtu.dk/material/22115/data.tar.gz&lt;br /&gt;
 tar -xvf data.tar.gz --directory molevol&lt;br /&gt;
 rm data.tar.gz&lt;br /&gt;
&lt;br /&gt;
 # Activate changes to .rc file in current terminal session&lt;br /&gt;
 source $MYRCFILE&lt;br /&gt;
&lt;br /&gt;
;  BEAST2&lt;br /&gt;
: Download newest version of BEAST2 from [https://www.beast2.org BEAST2 web site] and follow instructions to install. &lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
 # BEAST2&lt;br /&gt;
 brew install beast2&lt;br /&gt;
 echo &#039;PATH=&amp;quot;/usr/local/opt/beast2/libexec/bin${PATH:+:${PATH}}&amp;quot;&#039; &amp;gt;&amp;gt; $MYRCFILE&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
; R, RStudio&lt;br /&gt;
: Download newest version of R from [https://mirrors.dotsrc.org/cran/ CRAN] and follow instructions to install. Note: Different versions for Intel and Apple Silicon macs&lt;br /&gt;
: Download newest version of RStudio from [https://posit.co/download/rstudio-desktop/#download posit.co] and follow instructions&lt;br /&gt;
&lt;br /&gt;
 # R packages (do this inside Rstudio)&lt;br /&gt;
 install.packages(&amp;quot;tidyverse&amp;quot;)&lt;br /&gt;
 install.packages(&amp;quot;bayesplot&amp;quot;)&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=MacOS_software_installation&amp;diff=15</id>
		<title>MacOS software installation</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=MacOS_software_installation&amp;diff=15"/>
		<updated>2024-03-19T13:12:41Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;These are instructions for how to install software and data used on the course  Computational Molecular Evolution (22115) when using the MacOS operating system.  == Check which shell you are using, and specify .rc file == Depending on what version of MacOS you are using, [https://support.apple.com/en-us/HT208050 your shell is probably either bash or zsh]. This plays a role when you want to set [https://en.wikipedia.org/wiki/E...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;These are instructions for how to install software and data used on the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]] when using the MacOS operating system.&lt;br /&gt;
&lt;br /&gt;
== Check which shell you are using, and specify .rc file ==&lt;br /&gt;
Depending on what version of MacOS you are using, [https://support.apple.com/en-us/HT208050 your shell is probably either bash or zsh]. This plays a role when you want to set [https://en.wikipedia.org/wiki/Environment_variable environment variables] (including, importantly, [https://www.baeldung.com/linux/path-variable PATH] which tells your computer where to look for executables), define [https://www.tecmint.com/create-alias-in-linux/ aliases], etc: If you are using &amp;lt;code&amp;gt;bash&amp;lt;/code&amp;gt; then that information should be stored in the file &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt;, in the case of zsh, the file should be &amp;lt;code&amp;gt;.zshrc&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Issue the following command to see which one it is:&lt;br /&gt;
&lt;br /&gt;
 echo $SHELL&lt;br /&gt;
&lt;br /&gt;
If you are using zsh:&lt;br /&gt;
 export MYRCFILE=~/.zshrc&lt;br /&gt;
&lt;br /&gt;
If you are using bash:&lt;br /&gt;
 export MYRCFILE=~/.bashrc&lt;br /&gt;
&lt;br /&gt;
This will store the name of your .rc file in the environment variable &amp;lt;code&amp;gt;$MYRCFILE&amp;lt;/code&amp;gt;, which will be used in some commands below. (The environment variable is not stored, so it will only be active in the current terminal session).&lt;br /&gt;
&lt;br /&gt;
== Install required software ==&lt;br /&gt;
&lt;br /&gt;
 # Homebrew&lt;br /&gt;
 xcode-select --install&lt;br /&gt;
 /bin/bash -c &amp;quot;$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)&amp;quot;&lt;br /&gt;
&lt;br /&gt;
 # Text editor&lt;br /&gt;
 # This replaces &amp;lt;code&amp;gt;nedit&amp;lt;/code&amp;gt; in exercise manuals. &lt;br /&gt;
 # You need a text editor that can read and save plain-text files. &lt;br /&gt;
 # Here we install TextMate, but there are many other options, including built-in TextEdit, and [https://www.barebones.com/products/bbedit/ BBEdit]&lt;br /&gt;
 # To run from command line: Go to Textmate --&amp;gt; Preferences --&amp;gt; Terminal --&amp;gt; install shell support&lt;br /&gt;
 brew install --cask textmate&lt;br /&gt;
&lt;br /&gt;
 # MrBayes&lt;br /&gt;
 brew install mrbayes&lt;br /&gt;
&lt;br /&gt;
 # PAUP&lt;br /&gt;
 wget http://phylosolutions.com/paup-test/paup4a168_osx.gz&lt;br /&gt;
 gunzip paup4a168_osx.gz&lt;br /&gt;
 chmod 755 paup4a168_osx&lt;br /&gt;
 sudo mv paup4a168_osx /usr/local/bin/paup&lt;br /&gt;
&lt;br /&gt;
 # PAML&lt;br /&gt;
 brew install brewsci/bio/paml&lt;br /&gt;
&lt;br /&gt;
 # jmodeltest&lt;br /&gt;
 wget https://github.com/ddarriba/jmodeltest2/files/157117/jmodeltest-2.1.10.tar.gz&lt;br /&gt;
 sudo tar -xvf jmodeltest-2.1.10.tar.gz --directory /Applications&lt;br /&gt;
 rm jmodeltest-2.1.10.tar.gz&lt;br /&gt;
 echo &amp;quot;alias jmodeltest=&#039;java -jar /Applications/jmodeltest-2.1.10/jModelTest.jar &amp;amp;&amp;gt; /dev/null &amp;amp;&#039;&amp;quot; &amp;gt;&amp;gt; $MYRCFILE&lt;br /&gt;
&lt;br /&gt;
 # FigTree&lt;br /&gt;
 brew install brewsci/bio/figtree&lt;br /&gt;
&lt;br /&gt;
 # Tracer&lt;br /&gt;
 wget https://github.com/beast-dev/tracer/releases/download/v1.7.2/Tracer.v1.7.2.dmg&lt;br /&gt;
 hdiutil mount Tracer.v1.7.2.dmg&lt;br /&gt;
 sudo cp -R &amp;quot;/Volumes/Tracer/Tracer v1.7.2.app&amp;quot; /Applications&lt;br /&gt;
 hdiutil unmount /Volumes/Tracer&lt;br /&gt;
 echo &amp;quot;alias tracer=&#039;open /Applications/Tracer\ v1.7.2.app&#039;&amp;quot; &amp;gt;&amp;gt; $MYRCFILE&lt;br /&gt;
&lt;br /&gt;
 # MAFFT&lt;br /&gt;
 brew install mafft&lt;br /&gt;
&lt;br /&gt;
 # Aliview&lt;br /&gt;
 # Note: This only works if java has access to files and folders on the mac&lt;br /&gt;
 wget https://ormbunkar.se/aliview/downloads/mac/AliView-1.28-app.zip&lt;br /&gt;
 unzip AliView-1.28-app.zip&lt;br /&gt;
 sudo mv AliView-1.28/AliView.app /Applications&lt;br /&gt;
 rm -r AliView-1.28&lt;br /&gt;
 echo &#039;aliview() { java -jar /Applications/AliView.app/Contents/Resources/Java/repo/AliView/AliView/1.28/AliView-1.28.jar &amp;quot;$1&amp;quot; &amp;amp;&amp;gt; /dev/null &amp;amp; }&#039; &amp;gt;&amp;gt; $MYRCFILE&lt;br /&gt;
&lt;br /&gt;
 # seqconverter, sequencelib, phylotreelib&lt;br /&gt;
 # These are scripts and libraries written by me: &lt;br /&gt;
 # [https://github.com/agormp/phylotreelib GitHub phylotreelib] [https://github.com/agormp/sequencelib GitHub sequencelib] [https://github.com/agormp/seqconverter GitHub seqconverter]&lt;br /&gt;
 # These instructions assume you already have a working installation of python3&lt;br /&gt;
 python3 -m pip install seqconverter&lt;br /&gt;
 python3 -m pip install phylotreelib&lt;br /&gt;
&lt;br /&gt;
 # maxalign tool&lt;br /&gt;
 # See [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-312 MaxAlign: maximizing usable data in an alignment] for more information.&lt;br /&gt;
 wget http://teaching.healthtech.dtu.dk/material/22115/maxalign.pl&lt;br /&gt;
 chmod 755 maxalign.pl&lt;br /&gt;
 sudo mv maxalign.pl /usr/local/bin&lt;br /&gt;
&lt;br /&gt;
 # Set up directory for course exercises&lt;br /&gt;
 # You can place this directory anywhere you prefer. &lt;br /&gt;
 # Just replace tilde (~) in command below with path to preferred base directory&lt;br /&gt;
 cd ~&lt;br /&gt;
 mkdir molevol&lt;br /&gt;
 wget http://teaching.healthtech.dtu.dk/material/22115/data.tar.gz&lt;br /&gt;
 tar -xvf data.tar.gz --directory molevol&lt;br /&gt;
 rm data.tar.gz&lt;br /&gt;
&lt;br /&gt;
 # Activate changes to .rc file in current terminal session&lt;br /&gt;
 source $MYRCFILE&lt;br /&gt;
&lt;br /&gt;
;  BEAST2&lt;br /&gt;
: Download newest version of BEAST2 from [https://www.beast2.org BEAST2 web site] and follow instructions to install. &lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
 # BEAST2&lt;br /&gt;
 brew install beast2&lt;br /&gt;
 echo &#039;PATH=&amp;quot;/usr/local/opt/beast2/libexec/bin${PATH:+:${PATH}}&amp;quot;&#039; &amp;gt;&amp;gt; $MYRCFILE&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
; R, RStudio&lt;br /&gt;
: Download newest version of R from [https://mirrors.dotsrc.org/cran/ CRAN] and follow instructions to install. Note: Different versions for Intel and Apple Silicon macs&lt;br /&gt;
: Download newest version of RStudio from [https://posit.co/download/rstudio-desktop/#download posit.co] and follow instructions&lt;br /&gt;
&lt;br /&gt;
 # R packages (do this inside Rstudio)&lt;br /&gt;
 install.packages(&amp;quot;tidyverse&amp;quot;)&lt;br /&gt;
 install.packages(&amp;quot;bayesplot&amp;quot;)&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Windows_software_installation_-_Old_Version&amp;diff=14</id>
		<title>Windows software installation - Old Version</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Windows_software_installation_-_Old_Version&amp;diff=14"/>
		<updated>2024-03-19T13:10:40Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;These are instructions for how to install software and data used on the course Computational Molecular Evolution (22115) when using the Windows operating system.  __TOC__  == Windows 10 == Installing and running Windows Subsystem for Linux 2 (WSL2) is simpler on Windows 11. We therefore recommend that you [https://www.microsoft.com/en-us/windows/windows-11-specifications upgrade to Windows 11 if possible]. If you can&amp;#039;t upgrad...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;These are instructions for how to install software and data used on the course [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]] when using the Windows operating system.&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
== Windows 10 ==&lt;br /&gt;
Installing and running Windows Subsystem for Linux 2 (WSL2) is simpler on Windows 11. We therefore recommend that you [https://www.microsoft.com/en-us/windows/windows-11-specifications upgrade to Windows 11 if possible]. If you can&#039;t upgrade to windows 11, then follow the instructions for Windows 10.&lt;br /&gt;
&lt;br /&gt;
# Update Windows to latest version (necessary if you want to install WSL2): [https://support.microsoft.com/en-us/windows/update-windows-3c5ae7fc-9fb6-9af1-1984-b5e0412c556a#WindowsVersion=Windows_10 Windows Update]&lt;br /&gt;
# Install Windows Terminal from Microsoft Store (free): [https://www.microsoft.com/en-us/p/windows-terminal/9n0dx20hk701?activetab=pivot:overviewtab Windows Terminal]&lt;br /&gt;
# Install WSL2: &lt;br /&gt;
## [https://www.digitalcitizen.life/open-windows-terminal/ Start Windows Terminal as administrator], and then issue the following command (followed by RETURN):&lt;br /&gt;
## &amp;lt;code&amp;gt;wsl --install&amp;lt;/code&amp;gt;&lt;br /&gt;
## More details: [https://cloudbytes.dev/snippets/how-to-install-wsl2-on-windows-1011 How to install WSL2 on Windows 10/11]&lt;br /&gt;
# Install VcXsrv  X-server: &lt;br /&gt;
## Follow instructions here: [https://aalonso.dev/blog/how-to-use-gui-apps-in-wsl2-forwarding-x-server-cdj How to use GUI apps in WSL2 (forwarding X server)]&lt;br /&gt;
## When you need to use linux with graphical user interface software: Start the X-server by clicking Xlaunch (and just let it run in the background)&lt;br /&gt;
# Set Ubuntu as default profile in Windows Terminal:&lt;br /&gt;
##  Open Windows Terminal app&lt;br /&gt;
## Click down-arrow button on the top bar of the terminal window, and click Settings&lt;br /&gt;
## Select Startup --&amp;gt; Default profile --&amp;gt; Ubuntu&lt;br /&gt;
## Click Save, and close settings.&lt;br /&gt;
# Set Ubuntu home directory as starting directory for Windows Terminal&lt;br /&gt;
## In Windows Terminal running Ubuntu:&lt;br /&gt;
## &amp;lt;code&amp;gt;echo &amp;quot;cd ~&amp;quot; &amp;gt;&amp;gt; ~/.profile&amp;lt;/code&amp;gt;&lt;br /&gt;
# Install Linux software: &lt;br /&gt;
## Start Ubuntu in your Windows Terminal&lt;br /&gt;
## Follow instructions on [[Linux software installation]] page&lt;br /&gt;
&lt;br /&gt;
== Windows 11 ==&lt;br /&gt;
# Update Windows to latest version  (necessary if you want to install WSL2): [https://support.microsoft.com/en-us/windows/update-windows-3c5ae7fc-9fb6-9af1-1984-b5e0412c556a#WindowsVersion=Windows_11 Windows Update]&lt;br /&gt;
# Install WSL2: &lt;br /&gt;
## [https://www.digitalcitizen.life/open-windows-terminal/ Start Windows Terminal as administrator], and then issue the following command (followed by RETURN):&lt;br /&gt;
## &amp;lt;code&amp;gt;wsl --install&amp;lt;/code&amp;gt;&lt;br /&gt;
## More details: [https://cloudbytes.dev/snippets/how-to-install-wsl2-on-windows-1011 How to install WSL2 on Windows 10/11]&lt;br /&gt;
# Set Ubuntu as default profile in Windows Terminal:&lt;br /&gt;
##  Open Windows Terminal app&lt;br /&gt;
## Click down-arrow button on the top bar of the terminal window, and click Settings&lt;br /&gt;
## Select Startup --&amp;gt; Default profile --&amp;gt; Ubuntu&lt;br /&gt;
## Click Save, and close settings.&lt;br /&gt;
# Set Ubuntu home directory as starting directory for Windows Terminal&lt;br /&gt;
## In Windows Terminal running Ubuntu:&lt;br /&gt;
## &amp;lt;code&amp;gt;echo &amp;quot;cd ~&amp;quot; &amp;gt;&amp;gt; ~/.profile&amp;lt;/code&amp;gt;&lt;br /&gt;
# Install Linux software: &lt;br /&gt;
## Start Ubuntu in your Windows Terminal&lt;br /&gt;
## Follow instructions on [[Linux software installation]] page&lt;br /&gt;
&lt;br /&gt;
== Troubleshooting==&lt;br /&gt;
&lt;br /&gt;
===Windows Subsystem for Linux===&lt;br /&gt;
* This page lists some common problems, with solutions: [https://docs.microsoft.com/en-us/windows/wsl/troubleshooting Troubleshooting Windows Subsystem for Linux]&lt;br /&gt;
* Error 0x80370102:&lt;br /&gt;
** This is probably caused by hardware virtualization not being enabled in the BIOS (or UEFI)&lt;br /&gt;
** [https://mashtips.com/enable-virtualization-windows-10/ How to Enable Virtualization on Windows 10]&lt;br /&gt;
** [https://thegeekpage.com/wsl-register-distribution-error-0x80370102/ Fix: WSL Register Distribution Error 0x80370102 issue in Windows 11 / 10]&lt;br /&gt;
&lt;br /&gt;
===RStudio: use RStudio Server if RStudio Desktop does not work===&lt;br /&gt;
* Instead of Desktop version, you can alternatively use [https://support.rstudio.com/hc/en-us/articles/217799198-What-is-the-difference-between-RStudio-Desktop-RStudio-Workbench-and-RStudio-Server- RStudio Server]. RStudio Server is an application that provides a web-browser-based interface (instead of the standalone desktop app). Here are instructions for installing and running RStudio Server under WSL2:&lt;br /&gt;
* In a Ubuntu session in WSL2, run the following commands to install RStudio server:&lt;br /&gt;
** &amp;lt;code&amp;gt; wget https://rstudio.org/download/latest/stable/server/bionic/rstudio-server-latest-amd64.deb &amp;lt;/code&amp;gt;&lt;br /&gt;
** &amp;lt;code&amp;gt; sudo gdebi rstudio-server-latest-amd64.deb &amp;lt;/code&amp;gt;&lt;br /&gt;
** &amp;lt;code&amp;gt;sudo rm rstudio-server-latest-amd64.deb &amp;lt;/code&amp;gt;&lt;br /&gt;
* Start the RStudio server in Ubuntu:&lt;br /&gt;
** &amp;lt;code&amp;gt; sudo rstudio-server start &amp;lt;/code&amp;gt;&lt;br /&gt;
*  Open a browser window in Windows&lt;br /&gt;
* Go to the following URL, which should give you a GUI interface (in the web browser) for rstudio running on your Ubuntu: [http://localhost:8787 http://localhost:8787/]&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Linux_software_installation&amp;diff=13</id>
		<title>Linux software installation</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Linux_software_installation&amp;diff=13"/>
		<updated>2024-03-19T13:08:52Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
These are instructions for how to install software and data used on the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]] when using the Linux operating system.&lt;br /&gt;
&lt;br /&gt;
The commands assume you are using the [https://ubuntu.com/server/docs/package-management apt package manager] used on e.g. Ubuntu Linux.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: If your computer uses a CPU based on the ARM architecture, rather than the more common Intel or AMD (x86_64) architectures, some commands may need to be adjusted. Please let me know if this applies to you, so I can provide additional instructions tailored for ARM-based systems&lt;br /&gt;
&lt;br /&gt;
 # Use the out-commented commands below if you want to copy my premade .bashrc file for customising bash&lt;br /&gt;
 # WARNING: do not owerwrite a pre-existing .bashrc unless you are sure it contains nothing you want to keep&lt;br /&gt;
 # NOTE: if you are using a different shell, then you should use the corresponding .rc file (e.g., .zshrc for zsh)&lt;br /&gt;
 # wget http://teaching.healthtech.dtu.dk/material/22115/bashrc.txt&lt;br /&gt;
 # mv bashrc.txt ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # Nedit&lt;br /&gt;
 sudo apt update&lt;br /&gt;
 sudo apt -y install nedit&lt;br /&gt;
&lt;br /&gt;
 # R, Rstudio&lt;br /&gt;
 sudo apt -y install r-base r-base-dev gdebi-core&lt;br /&gt;
 wget https://download1.rstudio.org/electron/focal/amd64/rstudio-2023.12.1-402-amd64.deb&lt;br /&gt;
 sudo gdebi -n ./rstudio-2023.12.1-402-amd64.deb&lt;br /&gt;
 rm rstudio-2023.12.1-402-amd64.deb&lt;br /&gt;
&lt;br /&gt;
 # Dependencies for R-packages&lt;br /&gt;
 sudo apt -y install libcurl4-openssl-dev libxml2-dev libgit2-dev libopenblas-base&lt;br /&gt;
&lt;br /&gt;
 # MrBayes&lt;br /&gt;
 sudo apt -y install git&lt;br /&gt;
 git clone --depth=1 https://github.com/NBISweden/MrBayes.git ~/MrBayes&lt;br /&gt;
 cd ~/MrBayes&lt;br /&gt;
 ./configure --disable-sse&lt;br /&gt;
 make&lt;br /&gt;
 sudo make install&lt;br /&gt;
 cd ..&lt;br /&gt;
 # Note: above, I am using the flag --disable-sse to avoid crashes on some machines&lt;br /&gt;
 # It is possible that mb will run faster if you omit this flag, so you may want to experiment&lt;br /&gt;
 # with using just &amp;quot;./configure&amp;quot; instead (without the quotes)&lt;br /&gt;
&lt;br /&gt;
 # PAUP&lt;br /&gt;
 wget http://phylosolutions.com/paup-test/paup4a168_ubuntu64.gz&lt;br /&gt;
 gunzip paup4a168_ubuntu64.gz&lt;br /&gt;
 chmod 755 paup4a168_ubuntu64&lt;br /&gt;
 sudo mv paup4a168_ubuntu64 /usr/local/bin/paup&lt;br /&gt;
 sudo apt -y install libpython2.7&lt;br /&gt;
&lt;br /&gt;
 # PAML&lt;br /&gt;
 sudo apt -y install paml&lt;br /&gt;
&lt;br /&gt;
 # jmodeltest&lt;br /&gt;
 wget https://github.com/ddarriba/jmodeltest2/files/157117/jmodeltest-2.1.10.tar.gz&lt;br /&gt;
 sudo tar -xvf jmodeltest-2.1.10.tar.gz --directory /usr/local/src&lt;br /&gt;
 rm jmodeltest-2.1.10.tar.gz&lt;br /&gt;
 echo &amp;quot;alias jmodeltest=&#039;java -jar /usr/local/src/jmodeltest-2.1.10/jModelTest.jar&#039;&amp;quot; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # BEAST2&lt;br /&gt;
 wget https://github.com/CompEvol/beast2/releases/download/v2.7.6/BEAST.v2.7.6.Linux.x86.tgz&lt;br /&gt;
 sudo tar -zxvf BEAST.v2.7.6.Linux.x86.tgz --directory /usr/local/src&lt;br /&gt;
 echo &amp;quot;alias beauti=&#039;/usr/local/src/beast/bin/beauti &amp;gt; /dev/null 2&amp;gt; /dev/null&#039;&amp;quot; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
 echo &#039;PATH=&amp;quot;/usr/local/src/beast/bin${PATH:+:${PATH}}&amp;quot;&#039; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # FigTree&lt;br /&gt;
 sudo apt -y install figtree&lt;br /&gt;
&lt;br /&gt;
 # Tracer&lt;br /&gt;
 wget https://github.com/beast-dev/tracer/releases/download/v1.7.2/Tracer_v1.7.2.tgz&lt;br /&gt;
 sudo mkdir /usr/local/src/Tracer&lt;br /&gt;
 sudo tar -zxf Tracer_v1.7.2.tgz --directory /usr/local/src/Tracer&lt;br /&gt;
 sudo ln -s /usr/local/src/Tracer/bin/tracer /usr/local/bin/&lt;br /&gt;
 rm Tracer_v1.7.2.tgz&lt;br /&gt;
&lt;br /&gt;
 # MAFFT&lt;br /&gt;
 sudo apt -y install mafft&lt;br /&gt;
&lt;br /&gt;
 # Aliview&lt;br /&gt;
 wget https://ormbunkar.se/aliview/downloads/linux/linux-version-1.28/aliview.install.run&lt;br /&gt;
 chmod 755 aliview.install.run&lt;br /&gt;
 sudo ./aliview.install.run&lt;br /&gt;
 rm aliview.install.run&lt;br /&gt;
&lt;br /&gt;
 # seqconverter, sequencelib, phylotreelib&lt;br /&gt;
 # Anders Gorm scripts and libraries:&lt;br /&gt;
 # https://github.com/agormp/phylotreelib&lt;br /&gt;
 # https://github.com/agormp/seqconverter&lt;br /&gt;
 sudo apt -y install python3-numpy python3-pip&lt;br /&gt;
 pip3 install seqconverter&lt;br /&gt;
 pip3 install phylotreelib&lt;br /&gt;
 echo &#039;PATH=&amp;quot;~/.local/bin${PATH:+:${PATH}}&amp;quot;&#039; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # maxalign tool (see: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-312)&lt;br /&gt;
 wget http://teaching.healthtech.dtu.dk/material/22115/maxalign.pl&lt;br /&gt;
 chmod 755 maxalign.pl&lt;br /&gt;
 sudo mv maxalign.pl /usr/local/bin&lt;br /&gt;
&lt;br /&gt;
 # Clean up&lt;br /&gt;
 sudo apt autoremove --purge&lt;br /&gt;
 sudo apt clean&lt;br /&gt;
&lt;br /&gt;
 # Activate changes to .bashrc in current shell&lt;br /&gt;
 source ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # Set up molevol directory for course exercises&lt;br /&gt;
 # You can place this directory anywhere you prefer: &lt;br /&gt;
 # Just replace tilde (~) in the command below with path to preferred base directory&lt;br /&gt;
 # (The tilde symbol is short for the user&#039;s home directory)&lt;br /&gt;
 cd ~&lt;br /&gt;
 mkdir molevol&lt;br /&gt;
 wget http://teaching.healthtech.dtu.dk/material/22115/data.tar.gz&lt;br /&gt;
 tar -xvf data.tar.gz --directory molevol&lt;br /&gt;
 rm data.tar.gz&lt;br /&gt;
&lt;br /&gt;
 # R packages (do this inside Rstudio)&lt;br /&gt;
 install.packages(&amp;quot;tidyverse&amp;quot;)&lt;br /&gt;
 install.packages(&amp;quot;bayesplot&amp;quot;)&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Linux_software_installation&amp;diff=12</id>
		<title>Linux software installation</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Linux_software_installation&amp;diff=12"/>
		<updated>2024-03-19T13:08:38Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
These are instructions for how to install software and data used on the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]] when using the Linux operating system.&lt;br /&gt;
&lt;br /&gt;
The commands assume you are using the [https://ubuntu.com/server/docs/package-management apt package manager] used on e.g. Ubuntu Linux.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: If your computer uses a CPU based on the ARM architecture, rather than the more common Intel or AMD (x86_64) architectures, some commands may need to be adjusted. Please let me know if this applies to you, so I can provide additional instructions tailored for ARM-based systems&lt;br /&gt;
&lt;br /&gt;
 # Use the out-commented commands below if you want to copy my premade .bashrc file for customising bash&lt;br /&gt;
 # WARNING: do not owerwrite a pre-existing .bashrc unless you are sure it contains nothing you want to keep&lt;br /&gt;
 # NOTE: if you are using a different shell, then you should use the corresponding .rc file (e.g., .zshrc for zsh)&lt;br /&gt;
 # wget http://teaching.bioinformatics.dtu.dk/material/22115/bashrc.txt&lt;br /&gt;
 # mv bashrc.txt ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # Nedit&lt;br /&gt;
 sudo apt update&lt;br /&gt;
 sudo apt -y install nedit&lt;br /&gt;
&lt;br /&gt;
 # R, Rstudio&lt;br /&gt;
 sudo apt -y install r-base r-base-dev gdebi-core&lt;br /&gt;
 wget https://download1.rstudio.org/electron/focal/amd64/rstudio-2023.12.1-402-amd64.deb&lt;br /&gt;
 sudo gdebi -n ./rstudio-2023.12.1-402-amd64.deb&lt;br /&gt;
 rm rstudio-2023.12.1-402-amd64.deb&lt;br /&gt;
&lt;br /&gt;
 # Dependencies for R-packages&lt;br /&gt;
 sudo apt -y install libcurl4-openssl-dev libxml2-dev libgit2-dev libopenblas-base&lt;br /&gt;
&lt;br /&gt;
 # MrBayes&lt;br /&gt;
 sudo apt -y install git&lt;br /&gt;
 git clone --depth=1 https://github.com/NBISweden/MrBayes.git ~/MrBayes&lt;br /&gt;
 cd ~/MrBayes&lt;br /&gt;
 ./configure --disable-sse&lt;br /&gt;
 make&lt;br /&gt;
 sudo make install&lt;br /&gt;
 cd ..&lt;br /&gt;
 # Note: above, I am using the flag --disable-sse to avoid crashes on some machines&lt;br /&gt;
 # It is possible that mb will run faster if you omit this flag, so you may want to experiment&lt;br /&gt;
 # with using just &amp;quot;./configure&amp;quot; instead (without the quotes)&lt;br /&gt;
&lt;br /&gt;
 # PAUP&lt;br /&gt;
 wget http://phylosolutions.com/paup-test/paup4a168_ubuntu64.gz&lt;br /&gt;
 gunzip paup4a168_ubuntu64.gz&lt;br /&gt;
 chmod 755 paup4a168_ubuntu64&lt;br /&gt;
 sudo mv paup4a168_ubuntu64 /usr/local/bin/paup&lt;br /&gt;
 sudo apt -y install libpython2.7&lt;br /&gt;
&lt;br /&gt;
 # PAML&lt;br /&gt;
 sudo apt -y install paml&lt;br /&gt;
&lt;br /&gt;
 # jmodeltest&lt;br /&gt;
 wget https://github.com/ddarriba/jmodeltest2/files/157117/jmodeltest-2.1.10.tar.gz&lt;br /&gt;
 sudo tar -xvf jmodeltest-2.1.10.tar.gz --directory /usr/local/src&lt;br /&gt;
 rm jmodeltest-2.1.10.tar.gz&lt;br /&gt;
 echo &amp;quot;alias jmodeltest=&#039;java -jar /usr/local/src/jmodeltest-2.1.10/jModelTest.jar&#039;&amp;quot; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # BEAST2&lt;br /&gt;
 wget https://github.com/CompEvol/beast2/releases/download/v2.7.6/BEAST.v2.7.6.Linux.x86.tgz&lt;br /&gt;
 sudo tar -zxvf BEAST.v2.7.6.Linux.x86.tgz --directory /usr/local/src&lt;br /&gt;
 echo &amp;quot;alias beauti=&#039;/usr/local/src/beast/bin/beauti &amp;gt; /dev/null 2&amp;gt; /dev/null&#039;&amp;quot; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
 echo &#039;PATH=&amp;quot;/usr/local/src/beast/bin${PATH:+:${PATH}}&amp;quot;&#039; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # FigTree&lt;br /&gt;
 sudo apt -y install figtree&lt;br /&gt;
&lt;br /&gt;
 # Tracer&lt;br /&gt;
 wget https://github.com/beast-dev/tracer/releases/download/v1.7.2/Tracer_v1.7.2.tgz&lt;br /&gt;
 sudo mkdir /usr/local/src/Tracer&lt;br /&gt;
 sudo tar -zxf Tracer_v1.7.2.tgz --directory /usr/local/src/Tracer&lt;br /&gt;
 sudo ln -s /usr/local/src/Tracer/bin/tracer /usr/local/bin/&lt;br /&gt;
 rm Tracer_v1.7.2.tgz&lt;br /&gt;
&lt;br /&gt;
 # MAFFT&lt;br /&gt;
 sudo apt -y install mafft&lt;br /&gt;
&lt;br /&gt;
 # Aliview&lt;br /&gt;
 wget https://ormbunkar.se/aliview/downloads/linux/linux-version-1.28/aliview.install.run&lt;br /&gt;
 chmod 755 aliview.install.run&lt;br /&gt;
 sudo ./aliview.install.run&lt;br /&gt;
 rm aliview.install.run&lt;br /&gt;
&lt;br /&gt;
 # seqconverter, sequencelib, phylotreelib&lt;br /&gt;
 # Anders Gorm scripts and libraries:&lt;br /&gt;
 # https://github.com/agormp/phylotreelib&lt;br /&gt;
 # https://github.com/agormp/seqconverter&lt;br /&gt;
 sudo apt -y install python3-numpy python3-pip&lt;br /&gt;
 pip3 install seqconverter&lt;br /&gt;
 pip3 install phylotreelib&lt;br /&gt;
 echo &#039;PATH=&amp;quot;~/.local/bin${PATH:+:${PATH}}&amp;quot;&#039; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # maxalign tool (see: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-312)&lt;br /&gt;
 wget http://teaching.healthtech.dtu.dk/material/22115/maxalign.pl&lt;br /&gt;
 chmod 755 maxalign.pl&lt;br /&gt;
 sudo mv maxalign.pl /usr/local/bin&lt;br /&gt;
&lt;br /&gt;
 # Clean up&lt;br /&gt;
 sudo apt autoremove --purge&lt;br /&gt;
 sudo apt clean&lt;br /&gt;
&lt;br /&gt;
 # Activate changes to .bashrc in current shell&lt;br /&gt;
 source ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # Set up molevol directory for course exercises&lt;br /&gt;
 # You can place this directory anywhere you prefer: &lt;br /&gt;
 # Just replace tilde (~) in the command below with path to preferred base directory&lt;br /&gt;
 # (The tilde symbol is short for the user&#039;s home directory)&lt;br /&gt;
 cd ~&lt;br /&gt;
 mkdir molevol&lt;br /&gt;
 wget http://teaching.healthtech.dtu.dk/material/22115/data.tar.gz&lt;br /&gt;
 tar -xvf data.tar.gz --directory molevol&lt;br /&gt;
 rm data.tar.gz&lt;br /&gt;
&lt;br /&gt;
 # R packages (do this inside Rstudio)&lt;br /&gt;
 install.packages(&amp;quot;tidyverse&amp;quot;)&lt;br /&gt;
 install.packages(&amp;quot;bayesplot&amp;quot;)&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Linux_software_installation&amp;diff=11</id>
		<title>Linux software installation</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Linux_software_installation&amp;diff=11"/>
		<updated>2024-03-19T13:07:45Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
These are instructions for how to install software and data used on the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]] when using the Linux operating system.&lt;br /&gt;
&lt;br /&gt;
The commands assume you are using the [https://ubuntu.com/server/docs/package-management apt package manager] used on e.g. Ubuntu Linux.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: If your computer uses a CPU based on the ARM architecture, rather than the more common Intel or AMD (x86_64) architectures, some commands may need to be adjusted. Please let me know if this applies to you, so I can provide additional instructions tailored for ARM-based systems&lt;br /&gt;
&lt;br /&gt;
 # Use the out-commented commands below if you want to copy my premade .bashrc file for customising bash&lt;br /&gt;
 # WARNING: do not owerwrite a pre-existing .bashrc unless you are sure it contains nothing you want to keep&lt;br /&gt;
 # NOTE: if you are using a different shell, then you should use the corresponding .rc file (e.g., .zshrc for zsh)&lt;br /&gt;
 # wget http://teaching.bioinformatics.dtu.dk/material/22115/bashrc.txt&lt;br /&gt;
 # mv bashrc.txt ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # Nedit&lt;br /&gt;
 sudo apt update&lt;br /&gt;
 sudo apt -y install nedit&lt;br /&gt;
&lt;br /&gt;
 # R, Rstudio&lt;br /&gt;
 sudo apt -y install r-base r-base-dev gdebi-core&lt;br /&gt;
 wget https://download1.rstudio.org/electron/focal/amd64/rstudio-2023.12.1-402-amd64.deb&lt;br /&gt;
 sudo gdebi -n ./rstudio-2023.12.1-402-amd64.deb&lt;br /&gt;
 rm rstudio-2023.12.1-402-amd64.deb&lt;br /&gt;
&lt;br /&gt;
 # Dependencies for R-packages&lt;br /&gt;
 sudo apt -y install libcurl4-openssl-dev libxml2-dev libgit2-dev libopenblas-base&lt;br /&gt;
&lt;br /&gt;
 # MrBayes&lt;br /&gt;
 sudo apt -y install git&lt;br /&gt;
 git clone --depth=1 https://github.com/NBISweden/MrBayes.git ~/MrBayes&lt;br /&gt;
 cd ~/MrBayes&lt;br /&gt;
 ./configure --disable-sse&lt;br /&gt;
 make&lt;br /&gt;
 sudo make install&lt;br /&gt;
 cd ..&lt;br /&gt;
 # Note: above, I am using the flag --disable-sse to avoid crashes on some machines&lt;br /&gt;
 # It is possible that mb will run faster if you omit this flag, so you may want to experiment&lt;br /&gt;
 # with using just &amp;quot;./configure&amp;quot; instead (without the quotes)&lt;br /&gt;
&lt;br /&gt;
 # PAUP&lt;br /&gt;
 wget http://phylosolutions.com/paup-test/paup4a168_ubuntu64.gz&lt;br /&gt;
 gunzip paup4a168_ubuntu64.gz&lt;br /&gt;
 chmod 755 paup4a168_ubuntu64&lt;br /&gt;
 sudo mv paup4a168_ubuntu64 /usr/local/bin/paup&lt;br /&gt;
 sudo apt -y install libpython2.7&lt;br /&gt;
&lt;br /&gt;
 # PAML&lt;br /&gt;
 sudo apt -y install paml&lt;br /&gt;
&lt;br /&gt;
 # jmodeltest&lt;br /&gt;
 wget https://github.com/ddarriba/jmodeltest2/files/157117/jmodeltest-2.1.10.tar.gz&lt;br /&gt;
 sudo tar -xvf jmodeltest-2.1.10.tar.gz --directory /usr/local/src&lt;br /&gt;
 rm jmodeltest-2.1.10.tar.gz&lt;br /&gt;
 echo &amp;quot;alias jmodeltest=&#039;java -jar /usr/local/src/jmodeltest-2.1.10/jModelTest.jar&#039;&amp;quot; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # BEAST2&lt;br /&gt;
 wget https://github.com/CompEvol/beast2/releases/download/v2.7.6/BEAST.v2.7.6.Linux.x86.tgz&lt;br /&gt;
 sudo tar -zxvf BEAST.v2.7.6.Linux.x86.tgz --directory /usr/local/src&lt;br /&gt;
 echo &amp;quot;alias beauti=&#039;/usr/local/src/beast/bin/beauti &amp;gt; /dev/null 2&amp;gt; /dev/null&#039;&amp;quot; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
 echo &#039;PATH=&amp;quot;/usr/local/src/beast/bin${PATH:+:${PATH}}&amp;quot;&#039; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # FigTree&lt;br /&gt;
 sudo apt -y install figtree&lt;br /&gt;
&lt;br /&gt;
 # Tracer&lt;br /&gt;
 wget https://github.com/beast-dev/tracer/releases/download/v1.7.2/Tracer_v1.7.2.tgz&lt;br /&gt;
 sudo mkdir /usr/local/src/Tracer&lt;br /&gt;
 sudo tar -zxf Tracer_v1.7.2.tgz --directory /usr/local/src/Tracer&lt;br /&gt;
 sudo ln -s /usr/local/src/Tracer/bin/tracer /usr/local/bin/&lt;br /&gt;
 rm Tracer_v1.7.2.tgz&lt;br /&gt;
&lt;br /&gt;
 # MAFFT&lt;br /&gt;
 sudo apt -y install mafft&lt;br /&gt;
&lt;br /&gt;
 # Aliview&lt;br /&gt;
 wget https://ormbunkar.se/aliview/downloads/linux/linux-version-1.28/aliview.install.run&lt;br /&gt;
 chmod 755 aliview.install.run&lt;br /&gt;
 sudo ./aliview.install.run&lt;br /&gt;
 rm aliview.install.run&lt;br /&gt;
&lt;br /&gt;
 # seqconverter, sequencelib, phylotreelib&lt;br /&gt;
 # Anders Gorm scripts and libraries:&lt;br /&gt;
 # https://github.com/agormp/phylotreelib&lt;br /&gt;
 # https://github.com/agormp/seqconverter&lt;br /&gt;
 sudo apt -y install python3-numpy python3-pip&lt;br /&gt;
 pip3 install seqconverter&lt;br /&gt;
 pip3 install phylotreelib&lt;br /&gt;
 echo &#039;PATH=&amp;quot;~/.local/bin${PATH:+:${PATH}}&amp;quot;&#039; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # maxalign tool (see: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-312)&lt;br /&gt;
 wget http://teaching.bioinformatics.dtu.dk/material/22115/maxalign.pl&lt;br /&gt;
 chmod 755 maxalign.pl&lt;br /&gt;
 sudo mv maxalign.pl /usr/local/bin&lt;br /&gt;
&lt;br /&gt;
 # Clean up&lt;br /&gt;
 sudo apt autoremove --purge&lt;br /&gt;
 sudo apt clean&lt;br /&gt;
&lt;br /&gt;
 # Activate changes to .bashrc in current shell&lt;br /&gt;
 source ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # Set up molevol directory for course exercises&lt;br /&gt;
 # You can place this directory anywhere you prefer: &lt;br /&gt;
 # Just replace tilde (~) in the command below with path to preferred base directory&lt;br /&gt;
 # (The tilde symbol is short for the user&#039;s home directory)&lt;br /&gt;
 cd ~&lt;br /&gt;
 mkdir molevol&lt;br /&gt;
 wget http://teaching.bioinformatics.dtu.dk/material/22115/data.tar.gz&lt;br /&gt;
 tar -xvf data.tar.gz --directory molevol&lt;br /&gt;
 rm data.tar.gz&lt;br /&gt;
&lt;br /&gt;
 # R packages (do this inside Rstudio)&lt;br /&gt;
 install.packages(&amp;quot;tidyverse&amp;quot;)&lt;br /&gt;
 install.packages(&amp;quot;bayesplot&amp;quot;)&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=Linux_software_installation&amp;diff=10</id>
		<title>Linux software installation</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=Linux_software_installation&amp;diff=10"/>
		<updated>2024-03-19T13:04:17Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot; These are instructions for how to install software and data used on the course  Computational Molecular Evolution (22115) when using the Linux operating system.  The commands assume you are using the [https://ubuntu.com/server/docs/package-management apt package manager] used on e.g. Ubuntu Linux.  &amp;#039;&amp;#039;&amp;#039;Note&amp;#039;&amp;#039;&amp;#039;: If your computer uses a CPU based on the ARM architecture, rather than the more common Intel or AMD (x86_64) archite...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
These are instructions for how to install software and data used on the course  [[22115_-_Computational_Molecular_Evolution|Computational Molecular Evolution (22115)]] when using the Linux operating system.&lt;br /&gt;
&lt;br /&gt;
The commands assume you are using the [https://ubuntu.com/server/docs/package-management apt package manager] used on e.g. Ubuntu Linux.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note&#039;&#039;&#039;: If your computer uses a CPU based on the ARM architecture, rather than the more common Intel or AMD (x86_64) architectures, some commands may need to be adjusted. Please let me know if this applies to you, so I can provide additional instructions tailored for ARM-based systems&lt;br /&gt;
&lt;br /&gt;
 # Use the out-commented commands below if you want to copy my premade .bashrc file for customising bash&lt;br /&gt;
 # WARNING: do not owerwrite a pre-existing .bashrc unless you are sure it contains nothing you want to keep&lt;br /&gt;
 # NOTE: if you are using a different shell, then you should use the corresponding .rc file (e.g., .zshrc for zsh)&lt;br /&gt;
 # wget http://teaching.bioinformatics.dtu.dk/material/36615/bashrc.txt&lt;br /&gt;
 # mv bashrc.txt ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # Nedit&lt;br /&gt;
 sudo apt update&lt;br /&gt;
 sudo apt -y install nedit&lt;br /&gt;
&lt;br /&gt;
 # R, Rstudio&lt;br /&gt;
 sudo apt -y install r-base r-base-dev gdebi-core&lt;br /&gt;
 wget https://download1.rstudio.org/electron/focal/amd64/rstudio-2023.12.1-402-amd64.deb&lt;br /&gt;
 sudo gdebi -n ./rstudio-2023.12.1-402-amd64.deb&lt;br /&gt;
 rm rstudio-2023.12.1-402-amd64.deb&lt;br /&gt;
&lt;br /&gt;
 # Dependencies for R-packages&lt;br /&gt;
 sudo apt -y install libcurl4-openssl-dev libxml2-dev libgit2-dev libopenblas-base&lt;br /&gt;
&lt;br /&gt;
 # MrBayes&lt;br /&gt;
 sudo apt -y install git&lt;br /&gt;
 git clone --depth=1 https://github.com/NBISweden/MrBayes.git ~/MrBayes&lt;br /&gt;
 cd ~/MrBayes&lt;br /&gt;
 ./configure --disable-sse&lt;br /&gt;
 make&lt;br /&gt;
 sudo make install&lt;br /&gt;
 cd ..&lt;br /&gt;
 # Note: above, I am using the flag --disable-sse to avoid crashes on some machines&lt;br /&gt;
 # It is possible that mb will run faster if you omit this flag, so you may want to experiment&lt;br /&gt;
 # with using just &amp;quot;./configure&amp;quot; instead (without the quotes)&lt;br /&gt;
&lt;br /&gt;
 # PAUP&lt;br /&gt;
 wget http://phylosolutions.com/paup-test/paup4a168_ubuntu64.gz&lt;br /&gt;
 gunzip paup4a168_ubuntu64.gz&lt;br /&gt;
 chmod 755 paup4a168_ubuntu64&lt;br /&gt;
 sudo mv paup4a168_ubuntu64 /usr/local/bin/paup&lt;br /&gt;
 sudo apt -y install libpython2.7&lt;br /&gt;
&lt;br /&gt;
 # PAML&lt;br /&gt;
 sudo apt -y install paml&lt;br /&gt;
&lt;br /&gt;
 # jmodeltest&lt;br /&gt;
 wget https://github.com/ddarriba/jmodeltest2/files/157117/jmodeltest-2.1.10.tar.gz&lt;br /&gt;
 sudo tar -xvf jmodeltest-2.1.10.tar.gz --directory /usr/local/src&lt;br /&gt;
 rm jmodeltest-2.1.10.tar.gz&lt;br /&gt;
 echo &amp;quot;alias jmodeltest=&#039;java -jar /usr/local/src/jmodeltest-2.1.10/jModelTest.jar&#039;&amp;quot; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # BEAST2&lt;br /&gt;
 wget https://github.com/CompEvol/beast2/releases/download/v2.7.6/BEAST.v2.7.6.Linux.x86.tgz&lt;br /&gt;
 sudo tar -zxvf BEAST.v2.7.6.Linux.x86.tgz --directory /usr/local/src&lt;br /&gt;
 echo &amp;quot;alias beauti=&#039;/usr/local/src/beast/bin/beauti &amp;gt; /dev/null 2&amp;gt; /dev/null&#039;&amp;quot; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
 echo &#039;PATH=&amp;quot;/usr/local/src/beast/bin${PATH:+:${PATH}}&amp;quot;&#039; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # FigTree&lt;br /&gt;
 sudo apt -y install figtree&lt;br /&gt;
&lt;br /&gt;
 # Tracer&lt;br /&gt;
 wget https://github.com/beast-dev/tracer/releases/download/v1.7.2/Tracer_v1.7.2.tgz&lt;br /&gt;
 sudo mkdir /usr/local/src/Tracer&lt;br /&gt;
 sudo tar -zxf Tracer_v1.7.2.tgz --directory /usr/local/src/Tracer&lt;br /&gt;
 sudo ln -s /usr/local/src/Tracer/bin/tracer /usr/local/bin/&lt;br /&gt;
 rm Tracer_v1.7.2.tgz&lt;br /&gt;
&lt;br /&gt;
 # MAFFT&lt;br /&gt;
 sudo apt -y install mafft&lt;br /&gt;
&lt;br /&gt;
 # Aliview&lt;br /&gt;
 wget https://ormbunkar.se/aliview/downloads/linux/linux-version-1.28/aliview.install.run&lt;br /&gt;
 chmod 755 aliview.install.run&lt;br /&gt;
 sudo ./aliview.install.run&lt;br /&gt;
 rm aliview.install.run&lt;br /&gt;
&lt;br /&gt;
 # seqconverter, sequencelib, phylotreelib&lt;br /&gt;
 # Anders Gorm scripts and libraries:&lt;br /&gt;
 # https://github.com/agormp/phylotreelib&lt;br /&gt;
 # https://github.com/agormp/seqconverter&lt;br /&gt;
 sudo apt -y install python3-numpy python3-pip&lt;br /&gt;
 pip3 install seqconverter&lt;br /&gt;
 pip3 install phylotreelib&lt;br /&gt;
 echo &#039;PATH=&amp;quot;~/.local/bin${PATH:+:${PATH}}&amp;quot;&#039; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # maxalign tool (see: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-312)&lt;br /&gt;
 wget http://teaching.bioinformatics.dtu.dk/material/36615/maxalign.pl&lt;br /&gt;
 chmod 755 maxalign.pl&lt;br /&gt;
 sudo mv maxalign.pl /usr/local/bin&lt;br /&gt;
&lt;br /&gt;
 # Clean up&lt;br /&gt;
 sudo apt autoremove --purge&lt;br /&gt;
 sudo apt clean&lt;br /&gt;
&lt;br /&gt;
 # Activate changes to .bashrc in current shell&lt;br /&gt;
 source ~/.bashrc&lt;br /&gt;
&lt;br /&gt;
 # Set up molevol directory for course exercises&lt;br /&gt;
 # You can place this directory anywhere you prefer: &lt;br /&gt;
 # Just replace tilde (~) in the command below with path to preferred base directory&lt;br /&gt;
 # (The tilde symbol is short for the user&#039;s home directory)&lt;br /&gt;
 cd ~&lt;br /&gt;
 mkdir molevol&lt;br /&gt;
 wget http://teaching.bioinformatics.dtu.dk/material/36615/data.tar.gz&lt;br /&gt;
 tar -xvf data.tar.gz --directory molevol&lt;br /&gt;
 rm data.tar.gz&lt;br /&gt;
&lt;br /&gt;
 # R packages (do this inside Rstudio)&lt;br /&gt;
 install.packages(&amp;quot;tidyverse&amp;quot;)&lt;br /&gt;
 install.packages(&amp;quot;bayesplot&amp;quot;)&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=File:Darwin_logo2_medium.png&amp;diff=9</id>
		<title>File:Darwin logo2 medium.png</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=File:Darwin_logo2_medium.png&amp;diff=9"/>
		<updated>2024-03-19T13:03:32Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=22115_-_Computational_Molecular_Evolution&amp;diff=8</id>
		<title>22115 - Computational Molecular Evolution</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=22115_-_Computational_Molecular_Evolution&amp;diff=8"/>
		<updated>2024-03-19T12:36:12Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;; Overview  550px : This page contains links to video lectures, computer exercises, and other material for the course [https://kurser.dtu.dk/course/22115 22115 - Computational Molecular Evolution], which is part of the [https://www.dtu.dk/english/education/msc/programmes/systems_biology MSc in Bioinformatics and Systems Biology] at the [https://www.dtu.dk/english Technical University of Denmark]. The course is taught by Prof...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;; Overview  [[File:Darwin logo2 medium.png |right|border|550px]]&lt;br /&gt;
: This page contains links to video lectures, computer exercises, and other material for the course [https://kurser.dtu.dk/course/22115 22115 - Computational Molecular Evolution], which is part of the [https://www.dtu.dk/english/education/msc/programmes/systems_biology MSc in Bioinformatics and Systems Biology] at the [https://www.dtu.dk/english Technical University of Denmark]. The course is taught by Professor Anders Gorm Pedersen, [https://www.healthtech.dtu.dk/english/Research/Research-Sections/Section-Bioinformatics Section for Bioinformatics], [https://www.healthtech.dtu.dk/english Department of Health Technology].&lt;br /&gt;
&lt;br /&gt;
: The main goal of this course is to give an introduction to theory and algorithms in the field of computational molecular evolution. We will cover basic evolutionary theory (common descent, natural selection, genetic drift, models of growth and selection), and the main types of algorithms used for constructing and analyzing phylogenetic trees (parsimony, distance based methods, maximum likelihood methods, and Bayesian inference). We will also discuss the role of statistical modeling in science more generally&lt;br /&gt;
&lt;br /&gt;
:The course will consist of lectures, computer exercises, and mini-projects. The student will acquire practical experience in the use of a range of computational methods by analyzing sequences from the scientific literature.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
==&#039;&#039;&#039;Computer setup&#039;&#039;&#039;==&lt;br /&gt;
&lt;br /&gt;
===Linux===&lt;br /&gt;
:* [[Linux software installation]]&lt;br /&gt;
&amp;lt;!-- :* [[Notes on using Linux for exercises]] &#039;&#039;&#039;UNDER CONSTRUCTION&#039;&#039;&#039; --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Windows===&lt;br /&gt;
:* [[Windows software installation]]&lt;br /&gt;
&amp;lt;!--:* [[Notes on using Windows for exercises]] &#039;&#039;&#039;UNDER CONSTRUCTION&#039;&#039;&#039; --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===MacOS===&lt;br /&gt;
:* [[MacOS software installation]]&lt;br /&gt;
&amp;lt;!-- :* [[Notes on using MacOS for exercises]] &#039;&#039;&#039;UNDER CONSTRUCTION&#039;&#039;&#039; --&amp;gt;&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
===VirtualBox===&lt;br /&gt;
:* Use this only if you can&#039;t install natively on MacOS, Windows, or Linux. Runs a virtual Linux on top of your own OS.&lt;br /&gt;
:* [[VirtualBox installation]]&lt;br /&gt;
:* [[Notes on using VirtualBox for exercises]]&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== &#039;&#039;&#039;Lecture Schedule&#039;&#039;&#039; ==&lt;br /&gt;
&lt;br /&gt;
:([[27615 Previous course programs|Course programs, previous years]])&lt;br /&gt;
&lt;br /&gt;
===Week 1 (January 31): Introduction to evolutionary theory and population genetics. Models of growth, selection and mutation===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/okjVaLA5S38 Common descent (11:52)]&lt;br /&gt;
:* [https://youtu.be/VkkIu1ZtaIE Natural selection (14:57)]&lt;br /&gt;
:* [https://youtu.be/wqa6W3_WW7s Evidence for evolution (part 1) (9:34)] &lt;br /&gt;
:* [http://y2u.be/_-a-F8egAis Evidence for evolution (part 2) (20:54)]&lt;br /&gt;
:* [http://y2u.be/AUGbSMWPILE Population growth and selection (18:13)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [https://github.com/agormp/evolintro/blob/main/evolintro.pdf Lecture notes on evolutionary theory and population genetics]&lt;br /&gt;
:* [http://teaching.healthtech.dtu.dk/material/36615/slides_week1.pdf Slides, week 1]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Population Growth, Fitness, and Selection]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 2 (February 7): Neutral mutations and genetic drift. Tree reconstruction by parsimony===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/cQVjL50kK0k Neutral Theory of Molecular Evolution (11:28)]&lt;br /&gt;
:* [https://youtu.be/J8LDUFm4ttA Genetic Drift (9:35)]&lt;br /&gt;
:* [https://youtu.be/AZkHWdl2oAw Trees: Terminology and Representation (9:41)]&lt;br /&gt;
:* [https://youtu.be/zCj1s9fmaKs Homology and Homoplasy (9:07)]&lt;br /&gt;
:* [https://youtu.be/gXb_WuLCD8g Maximum Parsimony (7:48)]&lt;br /&gt;
:* [https://youtu.be/Q7ZpdPCx0uQ The Fitch Algorithm (10:31)]&lt;br /&gt;
:* [https://youtu.be/deywW9wJXmw Searching Tree Space (14:01)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [http://teaching.healthtech.dtu.dk/material/36615/slides_week2.pdf Slides, week 2]&lt;br /&gt;
:* [http://teaching.healthtech.dtu.dk/material/36615/Paup_Doc_31.pdf PAUP 3.1 manual (note: for older version - contains explanations of parsimony and tree moves)]&lt;br /&gt;
:* [http://teaching.healthtech.dtu.dk/material/36615/PAUP4-manual.pdf PAUP 4beta command reference]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Phylogenetic Analysis using Parsimony]]&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 3 (February 14): Consensus trees. Distance matrix methods===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://www.youtube.com/watch?v=YXZZyu9OAcg Consensus Trees (16:27)]&lt;br /&gt;
:* [https://www.youtube.com/watch?v=MhjSSxcGjaY Distance Matrix Methods, part 1 (6:07)]&lt;br /&gt;
:* [https://www.youtube.com/watch?v=PNoUcQTCxiM Distance Matrix Methods, part 2 (22:28)]&lt;br /&gt;
:* [https://www.youtube.com/watch?v=Dj24mCLQYUE Neighbour Joining (15:28)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Consensus.pdf|Handout exercise: Consensus Trees]]&lt;br /&gt;
:* [[Media:Distance handout.pdf|Handout exercise: Distance Matrix Methods]]&lt;br /&gt;
:* [[Media:Slides week3.pdf|Slides, week 3]]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercises&lt;br /&gt;
:* [[Consensus Trees]] &lt;br /&gt;
:* [[Distance Matrix Methods]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 4+5 (February 21 + 28): Mini project 1===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&lt;br /&gt;
Project description: [[Media:Miniproject1 whales.pdf|Building a tree from scratch: What are the closest relatives of whales?]]&lt;br /&gt;
&lt;br /&gt;
The mini project should be submitted and assessed via a peer assessment module that will become available on the course DTU Learn page.&lt;br /&gt;
&lt;br /&gt;
Take this tree quiz to  test yourself on your ability to accurately interpret evolutionary trees: &lt;br /&gt;
* [[Media:Treequiz1.pdf|Tree quiz]]&lt;br /&gt;
Check your replies here:  &lt;br /&gt;
* [[Media:Treequiz1 answers.pdf|Tree quiz with answers]] &lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 6 (March 6): Models of sequence evolution. Likelihood methods===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/ro2MFmVZypQ Models of evolution (28:48)]&lt;br /&gt;
:* [https://youtu.be/xDKUIegYpWM Maximum likelihood (22:06)]&lt;br /&gt;
:* [https://youtu.be/Siau2o_egGI Ancestral reconstruction (10:45)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Handout real exp change.pdf|Handout exercise: Real, Observed, and Expected Change]]&lt;br /&gt;
:* [[Media:Handout likelihood.pdf|Handout exercise: Computation of Likelihood]]&lt;br /&gt;
:* [[Media:Slides week4.pdf|Slides, week 6]]&lt;br /&gt;
:* [http://teaching.bioinformatics.dtu.dk/material/36615/substitutionmodels.pdf Lecture notes: Substitution models]&lt;br /&gt;
:* [http://teaching.bioinformatics.dtu.dk/material/36615/main.pdf Optional lecture notes: Matrix exponentials for Markov chains]&lt;br /&gt;
:; Computer exercises&lt;br /&gt;
:* [[Models of Evolution]]&lt;br /&gt;
:* [[Maximum Likelihood]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 7 (March 13): Bayesian inference of phylogeny===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://www.youtube.com/watch?v=DI3TIx78NqM&amp;amp;t=12s Bayesian Inference (23:48)]&lt;br /&gt;
:* [https://youtu.be/uyG5DVigEyM?list=PLXwjzs_mabFrlRF7uALEomfGGckG0sG5y Markov chain Monte Carlo (19:54)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Handout.class08.pdf|Handout exercise: Bayesian estimation of model parameter value]]&lt;br /&gt;
:* [[Media:Slides week5.pdf|Slides, week 7]]&lt;br /&gt;
:* [[Media:MTN122.pdf| An Introduction to Bayesian Statistics Without Using Equations]]&lt;br /&gt;
:* [http://www.nature.com/nbt/journal/v22/n9/pdf/nbt0904-1177.pdf Background reading: &amp;quot;What is Bayesian statistics?&amp;quot;]&lt;br /&gt;
:* [http://rsta.royalsocietypublishing.org/content/roypta/361/1813/2681.full.pdf Background reading: &amp;quot;Bayesian computation: a statistical revolution&amp;quot;]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Bayesian Phylogeny]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 8+9 (March 20 + April 3): Mini project 2===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&#039;&#039;&#039;Maximum likelihood and R-based phylogenetics - The origin of the Lassa and SARS-CoV-2 virus epidemics&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Project description and data sets&#039;&#039;&#039;: See DTU Learn page &lt;br /&gt;
&lt;br /&gt;
The mini project should be submitted and assessed via the peergrade interface at DTU Learn.&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 10 (April 10): Model Selection===&lt;br /&gt;
&lt;br /&gt;
:; Online lectures&lt;br /&gt;
:* [https://youtu.be/sJB2LmppZj8?list=PLXwjzs_mabFrlRF7uALEomfGGckG0sG5y Model selection, part 1 (15:19)]&lt;br /&gt;
:* [https://youtu.be/qSoDZ_33GbM Model selection, part 2 (17:20)]&lt;br /&gt;
:* [https://youtu.be/YYoo1vUO4ME Introduction to computer exercise: detection of selection (15:24)]&lt;br /&gt;
&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [[Media:Slides week6.pdf|Slides, week 10]]&lt;br /&gt;
:* [https://github.com/ddarriba/jmodeltest2/files/157130/manual.pdf jmodeltest manual]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Model selection]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 11 (April 17): Bayesian Phylogenetics, Part 2 ===&lt;br /&gt;
:; Course material&lt;br /&gt;
:* [https://www.researchgate.net/publication/319965471_A_biologist%27s_guide_to_Bayesian_phylogenetic_analysis A biologist’s guide to Bayesian phylogenetic analysis]&lt;br /&gt;
:* [https://beast.community/analysing_beast_output Analysing BEAST output using Tracer]&lt;br /&gt;
:* [https://beast.community/tracer_convergence Identifying convergence problems using Tracer]&lt;br /&gt;
:* [https://taming-the-beast.org/tutorials/Troubleshooting/ Post-processing and improving performance]&lt;br /&gt;
&lt;br /&gt;
:; Computer exercise&lt;br /&gt;
:* [[Bayesian phylogenetics: checking convergence]] &lt;br /&gt;
:* [[Bayesian phylogenetics: clock models]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
===Week 12 + 13 (April 24 + May 1): Mini project 3: Final exam===&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Details will follow&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
----&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=MediaWiki:Mainpage&amp;diff=7</id>
		<title>MediaWiki:Mainpage</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=MediaWiki:Mainpage&amp;diff=7"/>
		<updated>2024-03-19T12:35:43Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot;22115 - Computational Molecular Evolution&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;22115 - Computational Molecular Evolution&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=MediaWiki:Sidebar&amp;diff=6</id>
		<title>MediaWiki:Sidebar</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=MediaWiki:Sidebar&amp;diff=6"/>
		<updated>2024-03-19T12:33:14Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
* navigation&lt;br /&gt;
** https://teaching.healthtech.dtu.dk/|Course List&lt;br /&gt;
** https://teaching.healthtech.dtu.dk/22115/|Course 22115&lt;br /&gt;
* TOOLBOX&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=MediaWiki:Sidebar&amp;diff=5</id>
		<title>MediaWiki:Sidebar</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=MediaWiki:Sidebar&amp;diff=5"/>
		<updated>2024-03-19T12:33:02Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created page with &amp;quot; * navigation ** https://teaching.healthtech.dtu.dk/|Course List ** https://teaching.healthtech.dtu.dk/22115/|Course 22115 ** Programme|Programme * TOOLBOX&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
* navigation&lt;br /&gt;
** https://teaching.healthtech.dtu.dk/|Course List&lt;br /&gt;
** https://teaching.healthtech.dtu.dk/22115/|Course 22115&lt;br /&gt;
** Programme|Programme&lt;br /&gt;
* TOOLBOX&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=MediaWiki:Disclaimers&amp;diff=4</id>
		<title>MediaWiki:Disclaimers</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=MediaWiki:Disclaimers&amp;diff=4"/>
		<updated>2024-03-19T12:32:15Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created blank page&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=MediaWiki:Aboutsite&amp;diff=3</id>
		<title>MediaWiki:Aboutsite</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=MediaWiki:Aboutsite&amp;diff=3"/>
		<updated>2024-03-19T12:31:45Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created blank page&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22115/index.php?title=MediaWiki:Privacy&amp;diff=2</id>
		<title>MediaWiki:Privacy</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22115/index.php?title=MediaWiki:Privacy&amp;diff=2"/>
		<updated>2024-03-19T12:31:02Z</updated>

		<summary type="html">&lt;p&gt;WikiSysop: Created blank page&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
</feed>