Computational Molecular Evolution 22115 - 2021

From 22115
Jump to navigation Jump to search
Overview
The main goal of this course is to give an introduction to theory and algorithms in the field of computational molecular evolution. We will cover basic evolutionary theory (common descent, natural selection, genetic drift, models of growth and selection), and the main types of algorithms used for constructing and analyzing phylogenetic trees (parsimony, distance based methods, maximum likelihood methods, and Bayesian inference). We will also discuss the role of statistical modeling in science more generally
The course will consist of lectures, computer exercises, and micro-projects. The student will acquire practical experience in the use of a range of computational methods by analyzing sequences from the scientific literature.
Computer setup
In this course we will use software running on a Linux platform. You can do this by installing Oracle VirtualBox and then using the virtual disk image we have prepared for you (see links below). The virtual disk image contains both a pre-installed Linux operating system and all the software you will need to do the weekly computer exercises. If you are already running Linux (or a Linux-like operating system like MacOS) you may want to experiment with directly installing on your own computer (see instructions below), but then you have to sort out the installation issues yourself (the alternative is to install the virtual Linux on top of your own Linux, which also works).
  • Quick-start (video): How to install and use the virtual machine (shown on Mac OSX, but other platforms will be very similar). (Note: instruction video is for Coursera version of course, but it should be simple to extrapolate to your own situation).
  • MolEvol_2021.zip: Virtual Disk Image file (compressed) containing pre-installed Linux (Linux Lite) operating system and all software needed for this course.
    • Compressed file size: 4.5 GB
    • Full file size: 12 GB (maximum size: 30 GB, dynamically allocated, so size only increases as needed.)
    • Linux distribution used here is Linux Lite. Should be fairly simple to use (note the app start window in lower left corner, which works much like on Windows)
    • List of software used (in case you want to install on own operating system): Software installation instructions
  • Oracle VirtualBox: Download and install the version for your operating system. Allows use of guest operating system (Linux) on top of your main operating system (typically Windows, Mac OSX, or Linux).
The default user-ID and password on the virtual machine: user-ID = student, password = 1234

Lecture Schedule

(Course programs, previous years)

Week 1 (February 3): Introduction to evolutionary theory and population genetics. Models of growth, selection and mutation

Online lectures
Course material
Computer exercise

Week 2 (February 10): Neutral mutations and genetic drift. Tree reconstruction by parsimony

Online lectures
Course material
Computer exercise

Week 3+4 (February 17+24): Mini project 1

Project description: Building a tree from scratch: What are the closest relatives of whales?

The mini project should be submitted and assessed via a peer assessment module that will become available on the course DTU Learn page.

Take this tree quiz to test yourself on your ability to accurately interpret evolutionary trees:

Check your replies here:


Week 5 (March 3): Consensus trees. Distance matrix methods

Online lectures
Course material
Computer exercises

Week 6 (March 10): Models of sequence evolution. Likelihood methods

Online lectures
Course material
Computer exercises

Week 7+8 (March 17 + 24): Mini project 2

Maximum likelihood and R-based phylogenetics - The origin of the Lassa and SARS-CoV-2 virus epidemics

Lassa data set: lassa.nexus

Alignment of Lassa virus sequences encoding the transmembrane glycoprotein complex (GPC). GPC is important in the initial contact of the virus with the cells it infects and also its diffusion into the host cell. The 35 sequences include both rodent and human sequences, from a range of years, and a range of geographic locations. Names indicate location (Nig = Nigeria, Sier = Sierra Leone, Ivory = Ivory Coast, Lib = Liberia), year sampled, and host species (homo = human, nat = Mastomys natalensis, a rodent). You should use the "pinneo" strain from 1969 to root the tree (the "Pinneo" or "LP" strain of Lassa virus was isolated from the blood of Penny Pinneo, a Pioneer in Combating Lassa Fever, after a severe hemorrhagic illness acquired in Nigeria in 1969).

SARS-CoV-2 data set: See instructions in project description.

Project description: Miniproject2_corona.pdf

The mini project should be submitted and assessed via the peergrade interface at DTU Learn.


Week 9 (April 7): Bayesian inference of phylogeny

Online lectures
Course material
Computer exercise

Week 10 (April 14): Model Selection

Online lectures
Course material
Computer exercise

Week 11 (April 21): Bayesian Phylogenetics, Part 2

Course material
Computer exercise

Week 12 + 13 (April 28 + May 5): Mini project 3

Bayesian and likelihood-based phylogenetics. 
 SARS-CoV-2: selection and clock models

Project description: Miniproject3_corona.pdf

The mini project should be submitted and assessed via the peergrade interface.


Old exam sets