Bayesian phylogenetics: clock models: Difference between revisions

From 22115
Jump to navigation Jump to search
No edit summary
Line 39: Line 39:
'''Question 2'''
'''Question 2'''


In the first analysis, the tutorial uses a strict molecular clock. What does this assumption mean biologically and statistically? Why might this be a reasonable first model to try, and what kinds of evolutionary patterns would violate this assumption?
In the first analysis, the tutorial uses a strict molecular clock. What is the assumption behind this model? Explain what is being assumed about evolutionary rates on different branches, and why this means that expected branch length depends on branch duration and a single shared substitution rate. Also describe a pattern in the data that would suggest this assumption may be unrealistic.


'''Question 3'''
'''Question 3'''
Line 53: Line 53:
'''Question 5'''
'''Question 5'''


TreeAnnotator is used to summarize the posterior sample of trees into a single representative tree. Compared with an ordinary phylogram or consensus tree, what additional information does this summary tree contain? Mention at least two specific annotations and explain briefly why each is useful.
TreeAnnotator is used to summarize the posterior sample of trees into a single representative tree. Compared with an ordinary phylogram or a simple consensus tree, what additional information does this summary tree contain? Mention at least two specific annotations visible in this tutorial, and explain briefly why each is useful.


'''Question 6'''
'''Question 6'''
Line 65: Line 65:
'''Question 8'''
'''Question 8'''


Based on the relaxed-clock analysis, is there evidence for substantial rate variation among lineages? In your answer, state what output you used to assess this. Also comment on whether the main biological conclusion about introduction of yellow fever virus into the Americas changes or remains similar under the relaxed-clock model.
Based on the relaxed-clock analysis, is there evidence for substantial rate variation among lineages? In your answer, state which parameter in Tracer you inspected to assess this, and explain what kind of result would indicate little versus substantial rate variation. Also comment on whether the main biological conclusion about introduction of yellow fever virus into the Americas changes or remains similar under the relaxed-clock model.

Revision as of 12:50, 22 April 2026

This exercise is part of the course Computational Molecular Evolution (22115).

Overview

In this exercise we will use the software package BEASTX to infer phylogenies under molecular-clock models.

In previous exercises, branch lengths were measured only in expected numbers of substitutions per site. In a clock-based analysis, genetic change is instead related to calendar time through a model of evolutionary rates. If temporal information is available, for example in the form of known sampling times for rapidly evolving sequences, this can be used to estimate both the rate of evolution and the times of internal nodes in the tree.

In this exercise we will focus on so-called heterochronous data, i.e., sequence data where the individual sequences were sampled at different known times. When evolution is sufficiently rapid, the amount of sequence change observed over these sampling times contains information about the evolutionary rate and about the timing of common ancestors.

The main purpose of the exercise is:

  • to become familiar with the BEASTX workflow
  • to set up and run a clock-based Bayesian phylogenetic analysis
  • to inspect MCMC output in Tracer
  • to summarize posterior trees using TreeAnnotator
  • to visualize and interpret a dated tree in FigTree
  • to compare a strict-clock analysis with a relaxed-clock analysis
  • In the exercise below, you should follow the instructions on the tutorial page.
  • Depending on your operating system and how you installed the software, you can start the relevant programs either from the command line or by double clicking an app. The executables that you may need are:
    • beauti
    • beast
    • tracer
    • treeannotator
    • figtree

BEASTX tutorial

Answer the questions below and hand in the report. Include a small number of screendumps showing relevant output from the tools you are using.

Questions

Question 1

Explain what the temporal information is in this analysis. How does BEAST obtain information about the sampling times of the sequences, and why is that information needed in order to estimate dates in calendar time?

Question 2

In the first analysis, the tutorial uses a strict molecular clock. What is the assumption behind this model? Explain what is being assumed about evolutionary rates on different branches, and why this means that expected branch length depends on branch duration and a single shared substitution rate. Also describe a pattern in the data that would suggest this assumption may be unrealistic.

Question 3

After the first BEAST run, inspect the output in Tracer.

After the first BEAST run, inspect the output in Tracer. What indications are there that the initial run is not yet satisfactory? In your answer, mention burn-in, trace behaviour, and ESS, and include at least one relevant screendump from Tracer.

Question 4

Why does increasing the MCMC chain length help in this case? Explain the difference between increasing chain length and discarding a larger burn-in.

Question 5

TreeAnnotator is used to summarize the posterior sample of trees into a single representative tree. Compared with an ordinary phylogram or a simple consensus tree, what additional information does this summary tree contain? Mention at least two specific annotations visible in this tutorial, and explain briefly why each is useful.

Question 6

Inspect the summarized tree in FigTree. How do the virus samples from the Americas cluster relative to the African samples? What does the inferred timescale suggest about the origin and history of yellow fever virus in the Americas?

Question 7

The tutorial then repeats the analysis using a relaxed lognormal clock. What is the difference between a strict clock and this relaxed-clock model? What extra biological possibility is the relaxed-clock model allowing for?

Question 8

Based on the relaxed-clock analysis, is there evidence for substantial rate variation among lineages? In your answer, state which parameter in Tracer you inspected to assess this, and explain what kind of result would indicate little versus substantial rate variation. Also comment on whether the main biological conclusion about introduction of yellow fever virus into the Americas changes or remains similar under the relaxed-clock model.