Solution

From 22126
Jump to navigation Jump to search

Answers

Marsupials Newick

myTree <- read.tree(text = "((Echidna,Platypus),(American_Opossums,(Monito_del_Monte,((((Koalas,Wombats),(Possums,(Kangaroos,Wallabies))),(((Bandicoots,Bilby),Marsupial_Mole),((Tasmanian_Devil,Tasmanian_Tiger),Numbats)))))));")

Q1. The two trees contain the same information about relationships, but the rooted tree additionally contains information about the most recent common ancestor of the whole set (the root node), and this also adds the order of divergence events in time.

Q2. Branch lengths were not provided, so no information about amount of evolution or time can be drawn from these data.

Q3. The rooted tree gives information about the relative timing of events. For instance, we know that the Opossums were the first to diverge from all other marsupials.

Q4. We should prefer an unrooted tree unless we have strong confidence of the root placement in our data set. This is often done by trusting the signal from an unrelated “outgroup” taxon, but even this method can be misleading, such as if the outgroup is too unrelated and has a poor signal of placement among ingroup branches.

Q5. The unaligned data have different sequence lengths and do not contain gaps/indels. This means that we have not made an inference about the homology of sites in the data. Since homology is a fundamental assumption in phylogenetics, we cannot use the unaligned data for phylogenetic inference.

Q6. Regions that are difficult to align might have excess missing data, such that removing them can be beneficial. That is, we remove regions with excess uncertainty in alignment. Conversely, a small amount of missing data might indicate a genuine insertion/deletion, such that these regions can be highly informative and should not always be removed.

Q7. These lengths measure the summed amount of molecular change across the history of all samples in the data.

Q8. Gappy regions are often fast evolving or poorly aligned, such that they induce a greater amount of evolutionary change in inferences than complete data.

Q9. These methods do not lead to substantially different results. This can occur because the data are highly informative (or extremely uninformative). Another possible reason is that the parameter of interest is not difficult to infer. This is the case with phylogenetic tree topology, but phylogenetic branch lengths are more difficult, and this can be seen in modest differences among methods.

Q10. The GTR+R6 has more parameters and is therefore more complex. It also has a lower BIC score, suggesting that there is substantial complexity in these data that require multiple processes to be accounted for. Examining an even richer range of models could be beneficial.

Q11. The GTR+R6 model leads to a longer tree, suggesting that it can identify a greater number of evolutionary changes. Its lower BIC indicates that the simple JC model is failing to identify real change, probably because it does not incorporate realistic forms of variation such as rates across sites.

Q12. A–G and C–T changes are far more common than others. These are the two types transitions, which are comparatively energetically cheap and therefore expected to be far more common than transversions. The data set therefore follows a biochemical expectation.

Q13. There is a portion (~13%) that is evolving 4 times faster than the mean in the data. These could be sites with poor alignment or with limited biological importance and therefore low selection constraints.

Q14. The analysis using GTR+R6 took twice the amount of time, suggesting that in a large data set it could place a substantial computational and energetic burden. If this is calculated as excessive, it might be necessary to find a compromise with a simpler model that nonetheless captures important forms of variation in the data (e.g., transitions and transversions).

Q15. The branch supports suggest that these data offer limited confidence regarding some deep relationships among marsupial taxa. It seems particularly unclear whether the two types of possums are actually sisters, or whether the Monito del Monte is the sister to all other Australasian marsupials (versus embedded within them).

Q16. The signal across gene trees is largely consistent with that of our inferences directly from nucleotide data. However, the gene trees have substantial uncertainty regarding one of the deepest marsupial nodes, suggesting either the data are insufficient or there was a near-simultaneous diversification event among multiple groups.

Q17. The dates suggest that the split between American and Australasian marsupials occurred around the time of the final split of Gondwana. The Eocene and Oligocene saw the diversification of most of the major groupings of marsupials sampled.

Q18. The figure does not show any uncertainty in the tree topology or in the timing of divergence events. The tree topology could be shown via bootstrap values or a “cloud” of trees, while uncertainty in divergence events could be shown as bars traversing the plausible time period.