Exercise: Phylogeny - Answers (Seaview version): Difference between revisions

From 22111
Jump to navigation Jump to search
(Created page with "== Step 1 == Here is a PDF with the aligned sequences. ==Step 2== Here is the text file with the pairwise distances. It is clear that the sequence HTLV shows larger distances than all the other sequences, with all distances being above 0.7. ==Step3== Here is a picture of the NJ tree: File:Pol21-NJ_tree.png The longest branch is the one leading to HTLV, which is in good agreement with the observation in the prev...")
 
Line 1: Line 1:
== Step 1 ==
== Step 1 ==
[[Media:Pol21.aligned.pdf|Here]] is a PDF with the aligned sequences.
[https://teaching.healthtech.dtu.dk/material/22111/Pol21.aligned.pdf Here] is a PDF with the aligned sequences.


==Step 2==
==Step 2==

Revision as of 12:30, 15 March 2024

Step 1

Here is a PDF with the aligned sequences.

Step 2

Here is the text file with the pairwise distances. It is clear that the sequence HTLV shows larger distances than all the other sequences, with all distances being above 0.7.

Step3

Here is a picture of the NJ tree:

The longest branch is the one leading to HTLV, which is in good agreement with the observation in the previous question.

Step 4

Here is an unrooted tree:

Step 5

Here is a rearranged (swapped) tree:

Step 6

  • The sister group to the HIV1 sequences is SIVCZ (Chimpanzee SIV).
  • The sister group to the HIV2 sequences is Smanga (Sooty Mangabey SIV).
  • Further answers to "The Phylogeny of HIV" can be found here.

Step 7

There are several correct ways of doing this, since you can choose between several alignment methods. It could be argued that RevTrans is the most correct option, since we have coding DNA, and RevTrans gives us the "best of both worlds": it takes into account amino acid similarities when aligning, while it still has the non-coding differences in the aligned DNA. The trees below have been constructed using RevTrans. However, aligning the DNA directly with Clustal Omega in Seaview produces almost identical results and leads to the same conclusion.

Here is the tree made ignoring gap positions:

And here is the tree made taking gap positions into account:

There is one difference in the tree topology between the two trees: In the one made without the gap positions, Rice is together with Fruit fly within the animal subtree, while in the other tree, Rice is together with the two other plants. Since Rice is a plant, the tree taking gap positions into account is the most correct one. Note: This is not always the case!

Step 8

On the whole, the structure of this tree is exactly as we would expect it, based on the known phylogeny. However, the placement of salmon and frog together in a monophyletic group is not correct. The correct species phylogeny would have salmon branching out before frog, which would branch out before the group of mammals (see illustration below). Mammals and frogs belong together in the group Tetrapoda.

There are two additional errors, which are not as easy to detect but can be seen if all the taxa are compared using NCBI Taxonomy's "Common Tree" function (see illustration below).

First, the group of Human+Macaque is placed as a sister group to Pig+Whale, which is not correct. Human+Macaque should have been a sister group to Rat+Mouse, since primates and rodents belong together in the group Euarchontoglires.

Second, yeast is placed further from the animals than the plants are — that is also not correct. Yeast (and indeed all Fungi) actually belong together with the animals in the group Opisthokonta.

It is often seen that a phylogeny based on a single gene differs from the real phylogeny of the species. There are a number of reasons for why this happens, but one important one is simply the stochastic nature of mutations: Occasionally a gene will be most similar to the gene from a non-sister species, for entirely random reasons. This phenomenon tends to disappear as more sequence data is included in the analysis (the law of large numbers).

Step 9

  1. 53 results.
    Search string: (protein_name:"ribosomal protein l3") AND (taxonomy_id:2759) AND (fragment:false) AND (reviewed:true)
  2. 8 and 26 results, respectively.
    Search strings: (protein_name:"ribosomal protein l3") AND (taxonomy_id:2759) AND (fragment:false) AND (reviewed:true) AND (cc_scl_term:SL-0173)
    and (protein_name:"ribosomal protein l3") AND (taxonomy_id:2759) AND (fragment:false) AND (reviewed:true) AND (cc_scl_term:SL-0086)
    Under the Download tab in UniProt, select "Download all", "FASTA (canonical)" and "Uncompressed".
  3. Then use a plain text editor to combine them. Combined FASTA file is here: Media:Ribosomal_proteins_34.fasta.txt

Step 10

Open the FASTA file with the 34 ribosomal protein sequences in Seaview, make sure Alignment options is set to "clustalo", and align all sequences. Then make an NJ tree (with Ignore all gap sites unchecked) and change the view to "circular". Here is the result:

And here is the unrooted Newick tree file.


Step 11

Here is the rerooted tree made by Seaview:

Step 12

Here is the rerooted tree made by iTOL:

Yes, there is a difference: The tree from iTOL has the mitochondrial tips further to the right, while the tree from Seaview has the mitochondrial tips approximately aligned with the cytoplasmic ones. Note that when you select a branch for rerooting, the exact placement of the root on that branch is arbitrary. iTOL chooses the midpoint of the selected branch, while Seaview chooses a point that is closer to the midpoint of the entire tree. Without external information, it is not possible to say which method is most correct.

Step 13

Here is the annotated tree, with blue circles marking the most recent common ancestor of human and yeast, and the green circles marking the most recent common ancestor of human and mouse:

Step 14

  1. The mitochondrial proteins are more closely related to each other than to their respective cytoplasmic counterparts. This could indicate that mitochondria have appeared only once in evolution.
  2. There are two differences: In the mitochondria, Bovine (cow) is the sister group to Human, while in the cytoplasmic proteins, Mouse+Rat comprise the sister group to Human+Macaque. Also, in the mitochondria, Yeast branches out before Arabidopsis on the way to Human, while in the cytoplasmic proteins, the plants including Arabidopsis branch out (slightly) before the fungi including Yeast. In both aspects, the cytoplasmic tree is more correct.
  3. There are more mutations per time unit in the mitochondrial part of the tree. This is evident from the fact that the horizontal distance between the blue and the green circle is larger in the mitochondrial subtree (by approximately a factor 2). Note that the two blue circles represent the same time point in evolutionary history, as do the two green circles. Note also that the branch lengths are proportional to the number of substitutions (accepted mutations).