Computational Molecular Evolution 22115 - 2021
- Overview
- The main goal of this course is to give an introduction to theory and algorithms in the field of computational molecular evolution. We will cover basic evolutionary theory (common descent, natural selection, genetic drift, models of growth and selection), and the main types of algorithms used for constructing and analyzing phylogenetic trees (parsimony, distance based methods, maximum likelihood methods, and Bayesian inference). We will also discuss the role of statistical modeling in science more generally
- The course will consist of lectures, computer exercises, and micro-projects. The student will acquire practical experience in the use of a range of computational methods by analyzing sequences from the scientific literature.
- Computer setup
- In this course we will use software running on a Linux platform. You can do this by installing Oracle VirtualBox and then using the virtual disk image we have prepared for you (see links below). The virtual disk image contains both a pre-installed Linux operating system and all the software you will need to do the weekly computer exercises. If you are already running Linux (or a Linux-like operating system like MacOS) you may want to experiment with directly installing on your own computer (see instructions below), but then you have to sort out the installation issues yourself (the alternative is to install the virtual Linux on top of your own Linux, which also works).
- Quick-start (video): How to install and use the virtual machine (shown on Mac OSX, but other platforms will be very similar). (Note: instruction video is for Coursera version of course, but it should be simple to extrapolate to your own situation).
- MolEvol_2021.zip: Virtual Disk Image file (compressed) containing pre-installed Linux (Linux Lite) operating system and all software needed for this course.
- Compressed file size: 4.5 GB
- Full file size: 12 GB (maximum size: 30 GB, dynamically allocated, so size only increases as needed.)
- Linux distribution used here is Linux Lite. Should be fairly simple to use (note the app start window in lower left corner, which works much like on Windows)
- List of software used (in case you want to install on own operating system): Software installation instructions
- Oracle VirtualBox: Download and install the version for your operating system. Allows use of guest operating system (Linux) on top of your main operating system (typically Windows, Mac OSX, or Linux).
- The default user-ID and password on the virtual machine: user-ID = student, password = 1234
Lecture Schedule
Week 1 (February 3): Introduction to evolutionary theory and population genetics. Models of growth, selection and mutation
- Online lectures
- Course material
- Computer exercise
Week 2 (February 10): Neutral mutations and genetic drift. Tree reconstruction by parsimony
- Online lectures
- Course material
- Computer exercise
Week 3+4 (February 17+24): Mini project 1
Project description: Building a tree from scratch: What are the closest relatives of whales?
The mini project should be submitted and assessed via a peer assessment module that will become available on the course DTU Learn page.
Take this tree quiz to test yourself on your ability to accurately interpret evolutionary trees:
Check your replies here:
Week 5 (March 3): Consensus trees. Distance matrix methods
- Online lectures
- Course material
- Computer exercises
Week 6 (March 10): Models of sequence evolution. Likelihood methods
- Online lectures
- Course material
- Handout exercise: Real, Observed, and Expected Change
- Handout exercise: Computation of Likelihood
- Slides, week 6
- Lecture notes: Substitution models
- Optional lecture notes: Matrix exponentials for Markov chains
- Computer exercises
Week 7+8 (March 17 + 24): Mini project 2
Maximum likelihood and R-based phylogenetics - The origin of the Lassa and SARS-CoV-2 virus epidemics
Lassa data set: lassa.nexus
- Alignment of Lassa virus sequences encoding the transmembrane glycoprotein complex (GPC). GPC is important in the initial contact of the virus with the cells it infects and also its diffusion into the host cell. The 35 sequences include both rodent and human sequences, from a range of years, and a range of geographic locations. Names indicate location (Nig = Nigeria, Sier = Sierra Leone, Ivory = Ivory Coast, Lib = Liberia), year sampled, and host species (homo = human, nat = Mastomys natalensis, a rodent). You should use the "pinneo" strain from 1969 to root the tree (the "Pinneo" or "LP" strain of Lassa virus was isolated from the blood of Penny Pinneo, a Pioneer in Combating Lassa Fever, after a severe hemorrhagic illness acquired in Nigeria in 1969).
SARS-CoV-2 data set: See instructions in project description.
Project description: Miniproject2_corona.pdf
The mini project should be submitted and assessed via the peergrade interface at DTU Learn.
Week 9 (April 7): Bayesian inference of phylogeny
- Online lectures
- Course material
- Computer exercise
Week 10 (April 14): Model Selection
- Online lectures
- Course material
- Computer exercise
Week 11 (April 21): Bayesian Phylogenetics, Part 2
- Course material
Week 12 + 13 (April 28 + May 5): Mini project 3
Bayesian and likelihood-based phylogenetics. SARS-CoV-2: selection and clock models
Project description: Miniproject3_corona.pdf
The mini project should be submitted and assessed via the peergrade interface.