Acquired Immune Deficiency Syndrome (AIDS) is caused by two divergent viruses, Human Immunodeficiency Virus one (HIV-1) and Human Immunodeficiency Virus two (HIV-2). HIV-1 is responsible for the global pandemic, while HIV-2 has, until recently, been restricted to West Africa and appears to be less virulent in its effects. Viruses related to HIV have been found in many species of non-human primates (monkeys, apes, ...) and have been named Simian Immunodeficiency Virus, SIV. HTLV-1 is another, more distantly related, member of the family of retroviruses to which HIV and SIV belong.

The "Pol" gene, which is present in the genome of all these viruses, encodes three different polypeptides important for the viral life cycles: integrase, reverse transcriptase, and protease. It is expressed as a single polyprotein and is subsequently cleaved by protease into its three separate parts. In this exercise you will use a data set consisting of 21 different POL-polyprotein sequences from HIV1, HIV2, chimpanzee SIV, sooty mangabey SIV, and HTLV-1: