<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://teaching.healthtech.dtu.dk/22111/index.php?action=history&amp;feed=atom&amp;title=ExMulAlign-Answers-English</id>
	<title>ExMulAlign-Answers-English - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://teaching.healthtech.dtu.dk/22111/index.php?action=history&amp;feed=atom&amp;title=ExMulAlign-Answers-English"/>
	<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22111/index.php?title=ExMulAlign-Answers-English&amp;action=history"/>
	<updated>2026-05-15T11:53:53Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22111/index.php?title=ExMulAlign-Answers-English&amp;diff=194&amp;oldid=prev</id>
		<title>WikiSysop: /* Question 1 */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22111/index.php?title=ExMulAlign-Answers-English&amp;diff=194&amp;oldid=prev"/>
		<updated>2024-03-15T10:15:59Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Question 1&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 12:15, 15 March 2024&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l90&quot;&gt;Line 90:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 90:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;NOTICE&amp;#039;&amp;#039;&amp;#039;:  &lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;NOTICE&amp;#039;&amp;#039;&amp;#039;:  &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* It is essential to use SHORT descriptive names. In the ClustalW format alignment, only the first 15 characters of the names are shown, so if you have very long names the output can be hard to read (see also [&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[Media&lt;/del&gt;:GenBank+FASTA_handout_revised.pdf&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;|&lt;/del&gt;the FASTA handout from week 2&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;]&lt;/del&gt;]). &#039;&#039;Notice that JalView fails&#039;&#039; in a very opaque way &#039;&#039;if names are not &amp;lt;u&amp;gt;unique within the first 15 characters&amp;lt;/u&amp;gt;&#039;&#039; — it simply appends sequences into one long sequence, if it &quot;thinks&quot; they are named identically!&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* It is essential to use SHORT descriptive names. In the ClustalW format alignment, only the first 15 characters of the names are shown, so if you have very long names the output can be hard to read (see also [&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;https&lt;/ins&gt;:&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;//teaching.healthtech.dtu.dk/material/22111/&lt;/ins&gt;GenBank+FASTA_handout_revised.pdf the FASTA handout from week 2]). &#039;&#039;Notice that JalView fails&#039;&#039; in a very opaque way &#039;&#039;if names are not &amp;lt;u&amp;gt;unique within the first 15 characters&amp;lt;/u&amp;gt;&#039;&#039; — it simply appends sequences into one long sequence, if it &quot;thinks&quot; they are named identically!&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Spaces cannot be part of the names in a FASTA file. If there are spaces, only the first word after &amp;quot;&amp;lt;tt&amp;gt;&amp;amp;gt;&amp;lt;/tt&amp;gt;&amp;quot; counts as the name, subsequent words will be comments. If I had used spaces instead of underscore (&amp;quot;&amp;lt;tt&amp;gt;_&amp;lt;/tt&amp;gt;&amp;quot;) in the file above, the names would not have been unique (&amp;quot;duck&amp;quot; would have been used twice, etc.).&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Spaces cannot be part of the names in a FASTA file. If there are spaces, only the first word after &amp;quot;&amp;lt;tt&amp;gt;&amp;amp;gt;&amp;lt;/tt&amp;gt;&amp;quot; counts as the name, subsequent words will be comments. If I had used spaces instead of underscore (&amp;quot;&amp;lt;tt&amp;gt;_&amp;lt;/tt&amp;gt;&amp;quot;) in the file above, the names would not have been unique (&amp;quot;duck&amp;quot; would have been used twice, etc.).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Be aware that in GenBank entries containing several genes (see [&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[Media&lt;/del&gt;:GenBank+FASTA_handout_revised.pdf&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;|&lt;/del&gt;the GenBank handout from week 2&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;]&lt;/del&gt;]), the name of the individual gene (CDS) is found within the feature table. When you click on a CDS containing &quot;&amp;lt;tt&amp;gt;/gene_name=XYZ&amp;lt;/tt&amp;gt;&quot; or similar, it is therefore XYZ you need to use as name in your FASTA file, not the collective title for the entire GenBank entry (e.g. &quot;&amp;lt;tt&amp;gt;Alpha-A and Alpha-D genes ...&amp;lt;/tt&amp;gt;&quot; or &quot;&amp;lt;tt&amp;gt;Yeast Chromosome 2&amp;lt;/tt&amp;gt;&quot;). See also [&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[Media&lt;/del&gt;:MultiGeneScreenshot-en.pdf&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;| &lt;/del&gt;the screenshot/handout from the exercise&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;]&lt;/del&gt;].&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* Be aware that in GenBank entries containing several genes (see [&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;https&lt;/ins&gt;:&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;//teaching.healthtech.dtu.dk/material/22111/&lt;/ins&gt;GenBank+FASTA_handout_revised.pdf the GenBank handout from week 2]), the name of the individual gene (CDS) is found within the feature table. When you click on a CDS containing &quot;&amp;lt;tt&amp;gt;/gene_name=XYZ&amp;lt;/tt&amp;gt;&quot; or similar, it is therefore XYZ you need to use as name in your FASTA file, not the collective title for the entire GenBank entry (e.g. &quot;&amp;lt;tt&amp;gt;Alpha-A and Alpha-D genes ...&amp;lt;/tt&amp;gt;&quot; or &quot;&amp;lt;tt&amp;gt;Yeast Chromosome 2&amp;lt;/tt&amp;gt;&quot;). See also [&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;https&lt;/ins&gt;:&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;//teaching.healthtech.dtu.dk/material/22111/&lt;/ins&gt;MultiGeneScreenshot-en.pdf the screenshot/handout from the exercise].&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;!-- * The last GenBank entry (&amp;quot;&amp;lt;tt&amp;gt;AF098919&amp;lt;/tt&amp;gt;&amp;quot; - chicken) contains three genes: &amp;quot;&amp;lt;tt&amp;gt;embryonic alpha-type globin pi&amp;lt;/tt&amp;gt;&amp;quot;, &amp;quot;&amp;lt;tt&amp;gt;adult alpha D globin&amp;lt;/tt&amp;gt;&amp;quot; and &amp;quot;&amp;lt;tt&amp;gt;adult alpha A globin&amp;lt;/tt&amp;gt;&amp;quot;. Here, I have chosen to include only the two last ones, since the first one is described as &amp;quot;alpha-type&amp;quot; instead of &amp;quot;alpha&amp;quot;. It is OK to include &amp;quot;embryonic alpha-type globin pi&amp;quot; to avoid discarding too much — if you do, you will see that it stands out as a separate group in the distance tree produced by MAFFT. This is a good indicator that it is something different. You could then optionally go back and discard it, or write a remark about it being separate. --&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;!-- * The last GenBank entry (&amp;quot;&amp;lt;tt&amp;gt;AF098919&amp;lt;/tt&amp;gt;&amp;quot; - chicken) contains three genes: &amp;quot;&amp;lt;tt&amp;gt;embryonic alpha-type globin pi&amp;lt;/tt&amp;gt;&amp;quot;, &amp;quot;&amp;lt;tt&amp;gt;adult alpha D globin&amp;lt;/tt&amp;gt;&amp;quot; and &amp;quot;&amp;lt;tt&amp;gt;adult alpha A globin&amp;lt;/tt&amp;gt;&amp;quot;. Here, I have chosen to include only the two last ones, since the first one is described as &amp;quot;alpha-type&amp;quot; instead of &amp;quot;alpha&amp;quot;. It is OK to include &amp;quot;embryonic alpha-type globin pi&amp;quot; to avoid discarding too much — if you do, you will see that it stands out as a separate group in the distance tree produced by MAFFT. This is a good indicator that it is something different. You could then optionally go back and discard it, or write a remark about it being separate. --&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;When you build a &amp;quot;real&amp;quot; dataset for a research project, it is often an iterative process, where you 1) collect your data, 2) weed out outliers, 3) run an analysis, and repeat 2) and 3) until you are satisfied with the results.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;When you build a &amp;quot;real&amp;quot; dataset for a research project, it is often an iterative process, where you 1) collect your data, 2) weed out outliers, 3) run an analysis, and repeat 2) and 3) until you are satisfied with the results.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22111/index.php?title=ExMulAlign-Answers-English&amp;diff=191&amp;oldid=prev</id>
		<title>WikiSysop: Created page with &quot;Click here for Danish version.  =Answers to the Multiple Alignment exercise=  By: [http://www.dtu.dk/service/telefonbog/person?id=18103&amp;cpid=214039&amp;tab=2&amp;qt=dtupublicationquery Rasmus Wernersson]  ==Question 1== FASTA file:   &gt;pigeon_alpha-D-globin  ATGCTGACCGACTCTGACAAGAAGCTGGTCCTGCAGGTGTGGGAGAAGGTGATCCGCCACCCAGACTGTG  GAGCCGAGGCCCTGGAGAGGCTGTTCACCACCTACCCCCAGACCAAGACCTACTTCCCCCACTTCGACTT  GCACCATGGCTCCGACCAGGTCCGCAACCACGGCAAGAAGGTGTTGGCCGCCTTGGGC...&quot;</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22111/index.php?title=ExMulAlign-Answers-English&amp;diff=191&amp;oldid=prev"/>
		<updated>2024-03-15T10:12:15Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;Click &lt;a href=&quot;/22111/index.php/ExMulAlign-Answers&quot; title=&quot;ExMulAlign-Answers&quot;&gt;here&lt;/a&gt; for Danish version.  =Answers to the Multiple Alignment exercise=  By: [http://www.dtu.dk/service/telefonbog/person?id=18103&amp;amp;cpid=214039&amp;amp;tab=2&amp;amp;qt=dtupublicationquery Rasmus Wernersson]  ==Question 1== FASTA file:   &amp;gt;pigeon_alpha-D-globin  ATGCTGACCGACTCTGACAAGAAGCTGGTCCTGCAGGTGTGGGAGAAGGTGATCCGCCACCCAGACTGTG  GAGCCGAGGCCCTGGAGAGGCTGTTCACCACCTACCCCCAGACCAAGACCTACTTCCCCCACTTCGACTT  GCACCATGGCTCCGACCAGGTCCGCAACCACGGCAAGAAGGTGTTGGCCGCCTTGGGC...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;Click [[ExMulAlign-Answers|here]] for Danish version.&lt;br /&gt;
&lt;br /&gt;
=Answers to the Multiple Alignment exercise=&lt;br /&gt;
&lt;br /&gt;
By: [http://www.dtu.dk/service/telefonbog/person?id=18103&amp;amp;cpid=214039&amp;amp;tab=2&amp;amp;qt=dtupublicationquery Rasmus Wernersson]&lt;br /&gt;
&lt;br /&gt;
==Question 1==&lt;br /&gt;
FASTA file:&lt;br /&gt;
&lt;br /&gt;
 &amp;gt;pigeon_alpha-D-globin&lt;br /&gt;
 ATGCTGACCGACTCTGACAAGAAGCTGGTCCTGCAGGTGTGGGAGAAGGTGATCCGCCACCCAGACTGTG&lt;br /&gt;
 GAGCCGAGGCCCTGGAGAGGCTGTTCACCACCTACCCCCAGACCAAGACCTACTTCCCCCACTTCGACTT&lt;br /&gt;
 GCACCATGGCTCCGACCAGGTCCGCAACCACGGCAAGAAGGTGTTGGCCGCCTTGGGCAACGCTGTCAAG&lt;br /&gt;
 AGCCTGGGCAACCTCAGCCAAGCCCTGTCTGACCTCAGCGACCTGCATGCCTACAACCTGCGTGTCGACC&lt;br /&gt;
 CTGTCAACTTCAAGCTGCTGGCGCAGTGCTTCCACGTGGTGCTGGCCACACACCTGGGCAACGACTACAC&lt;br /&gt;
 CCCGGAGGCACATGCTGCCTTCGACAAGTTCCTGTCGGCTGTGTGCACCGTGCTGGCCGAGAAGTACAGA&lt;br /&gt;
 TAA&lt;br /&gt;
 &amp;gt;pigeon_alpha-A-globin&lt;br /&gt;
 ATGGTGCTGTCTGCCAACGACAAGAGCAACGTGAAGGCCGTCTTCGGCAAAATCGGCGGCCAGGCCGGTG&lt;br /&gt;
 ACTTGGGTGGTGAAGCCCTGGAGAGGTTGTTCATCACCTACCCCCAGACCAAGACCTACTTCCCCCACTT&lt;br /&gt;
 CGACCTGTCACATGGCTCCGCTCAGATCAAGGGGCACGGCAAGAAGGTGGCGGAGGCACTGGTTGAGGCT&lt;br /&gt;
 GCCAACCACATCGATGACATCGCTGGTGCCCTCTCCAAGCTGAGCGACCTCCACGCCCAAAAGCTCCGTG&lt;br /&gt;
 TGGACCCCGTCAACTTCAAACTGCTGGGTCACTGCTTCCTGGTGGTCGTGGCCGTCCACTTCCCCTCTCT&lt;br /&gt;
 CCTGACCCCGGAGGTCCATGCTTCCCTGGACAAGTTCGTGTGTGCCGTGGGCACCGTCCTTACTGCCAAG&lt;br /&gt;
 TACCGTTAA&lt;br /&gt;
 &amp;gt;duck_alpha-D-globin&lt;br /&gt;
 ATGCTGACCGCCGAGGACAAGAAGCTCATCGTGCAGGTGTGGGAGAAGGTGGCTGGCCACCAGGAGGAAT&lt;br /&gt;
 TCGGAAGTGAAGCTCTGCAGAGGATGTTCCTCGCCTACCCCCAGACCAAGACCTACTTCCCCCACTTCGA&lt;br /&gt;
 CCTGCATCCCGGCTCTGAACAGGTCCGTGGCCATGGCAAGAAAGTGGCGGCTGCCCTGGGCAATGCCGTG&lt;br /&gt;
 AAGAGCCTGGACAACCTCAGCCAGGCCCTGTCTGAGCTCAGCAACCTGCATGCCTACAACCTGCGTGTTG&lt;br /&gt;
 ACCCTGTCAACTTCAAGCTGCTGGCACAGTGCTTCCAGGTGGTGCTGGCCGCACACCTGGGCAAAGACTA&lt;br /&gt;
 CAGCCCCGAGATGCATGCTGCCTTTGACAAGTTCTTGTCCGCCGTGGCTGCCGTGCTGGCTGAAAAGTAC&lt;br /&gt;
 AGATGA&lt;br /&gt;
 &amp;gt;duck_alpha-A-globin&lt;br /&gt;
 ATGGTGCTGTCTGCGGCTGACAAGACCAACGTCAAGGGTGTCTTCTCCAAAATCGGTGGCCATGCTGAGG&lt;br /&gt;
 AGTATGGCGCCGAGACCCTGGAGAGGATGTTCATCGCCTACCCCCAGACCAAGACCTACTTCCCCCACTT&lt;br /&gt;
 TGACCTGCAGCACGGCTCTGCTCAGATCAAGGCCCATGGCAAGAAGGTGGCGGCTGCCCTAGTTGAAGCT&lt;br /&gt;
 GTCAACCACATCGATGACATTGCGGGTGCTCTCTCCAAGCTCAGTGACCTCCACGCCCAAAAGCTCCGTG&lt;br /&gt;
 TGGACCCTGTCAACTTCAAATTCCTGGGCCACTGCTTCCTGGTGGTGGTTGCCATCCACCACCCCGCTGC&lt;br /&gt;
 CCTGACCCCAGAGGTCCACGCTTCCCTGGACAAGTTCATGTGCGCCGTGGGTGCTGTGCTGACTGCCAAG&lt;br /&gt;
 TACCGTTAG&lt;br /&gt;
 &amp;gt;Goat_alpha-i-globin&lt;br /&gt;
 ATGGTGCTGTCTGCCGCCGACAAGTCCAATGTCAAGGCCGCCTGGGGCAAGGTTGGCGGCAACGCTGGAG&lt;br /&gt;
 CTTATGGCGCAGAGGCTCTGGAGAGGATGTTCCTGAGCTTCCCCACCACCAAGACCTACTTCCCCCACTT&lt;br /&gt;
 CGACCTGAGCCACGGCTCGGCCCAGGTCAAGGGCCACGGCGAGAAGGTGGCCGCCGCGCTGACCAAAGCG&lt;br /&gt;
 GTGGGCCACCTGGACGACCTGCCCGGTACTCTGTCTGATCTGAGTGACCTGCACGCCCACAAGCTGCGTG&lt;br /&gt;
 TGGACCCGGTCAACTTTAAGCTTCTGAGCCACTCCCTGCTGGTGACCCTGGCCTGCCACCTCCCCAATGA&lt;br /&gt;
 TTTCACCCCCGCGGTCCACGCCTCCCTGGACAAGTTCTTGGCCAACGTGAGCACCGTGCTGACCTCCAAA&lt;br /&gt;
 TACCGTTAA&lt;br /&gt;
 &amp;gt;Goat_alpha-ii-globin&lt;br /&gt;
 ATGGTGCTGTCTGCCGCCGACAAGTCCAATGTCAAGGCCGCCTGGGGCAAGGTTGGCAGCAACGCTGGAG&lt;br /&gt;
 CTTATGGCGCAGAGGCTCTGGAGAGGATGTTCCTGAGCTTCCCCACCACCAAGACCTACTTCCCCCACTT&lt;br /&gt;
 CGACCTGAGCCACGGCTCGGCCCAGGTCAAGGGCCACGGCGAGAAGGTGGCCGCCGCGCTGACCAAAGCG&lt;br /&gt;
 GTGGGCCACCTGGACGACCTGCCCGGTACTCTGTCTGATCTGAGTGACCTGCACGCCCACAAGCTGCGTG&lt;br /&gt;
 TGGACCCGGTCAACTTTAAGCTTCTGAGCCACTCCCTGCTGGTGACCCTGGCCTGCCACCACCCCAGTGA&lt;br /&gt;
 TTTCACCCCCGCGGTCCACGCCTCCCTGGACAAGTTCTTGGCCAACGTGAGCACCGTGCTGACCTCCAAA&lt;br /&gt;
 TACCGTTAA&lt;br /&gt;
 &amp;gt;Horse_alpha-1_globin&lt;br /&gt;
 ATGGTGCTGTCTGCCGCCGACAAGACCAACGTCAAGGCCGCCTGGAGTAAGGTTGGCGGCCACGCTGGCG&lt;br /&gt;
 AGTTTGGCGCAGAGGCCCTAGAGAGGATGTTCCTGGGCTTCCCCACCACCAAGACCTACTTCCCCCACTT&lt;br /&gt;
 CGATCTGAGCCACGGCTCCGCCCAGGTCAAGGCCCACGGCAAGAAGGTGGGCGACGCGCTGACTCTCGCC&lt;br /&gt;
 GTGGGCCACCTGGACGACCTGCCTGGCGCCCTGTCGAATCTGAGCGACCTGCACGCACACAAGCTGCGCG&lt;br /&gt;
 TGGACCCCGTCAACTTCAAGCTTCTGAGTCATTGCCTGCTGTCCACCTTGGCCGTCCACCTCCCCAACGA&lt;br /&gt;
 TTTCACCCCTGCCGTCCACGCCTCCCTGGACAAGTTCTTGAGCAGTGTGAGCACCGTGCTGACCTCCAAA&lt;br /&gt;
 TACCGTTAA&lt;br /&gt;
 &amp;gt;Horse_alpha-2_globin&lt;br /&gt;
 ATGGTGCTGTCTGCCGCCGACAAGACCAACGTCAAGGCCGCCTGGAGTAAGGTTGGCGGCCACGCTGGCG&lt;br /&gt;
 AGTATGGCGCAGAGGCCCTAGAGAGGATGTTCCTGGGCTTCCCCACCACCAAGACCTACTTCCCCCACTT&lt;br /&gt;
 CGATCTGAGCCACGGCTCCGCCCAGGTCAAGGCCCACGGCCAGAAGGTGGGCGACGCGCTGACTCTCGCC&lt;br /&gt;
 GTGGGCCACCTGGACGACCTGCCTGGCGCCCTGTCGAATCTGAGCGACCTGCACGCACACAAGCTGCGCG&lt;br /&gt;
 TGGACCCCGTCAACTTCAAGCTCCTGAGTCATTGCCTGCTGTCCACCTTGGCCGTCCACCTCCCCAACGA&lt;br /&gt;
 TTTCACCCCTGCCGTCCACGCCTCCCTGGACAAGTTCTTGAGCAGTGTGAGCACCGTGCTGACCTCCAAA&lt;br /&gt;
 TACCGTTAA&lt;br /&gt;
 &amp;gt;Chicken_alpha-D&lt;br /&gt;
 ATGCTGACTGCCGAGGACAAGAAGCTCATCCAGCAGGCCTGGGAGAGGGCCGCTTCCCACCAGGAGGAGT&lt;br /&gt;
 TTGGAGCTGAGGCTCTGACTAGGATGTTCACCACCTATCCCCAGACCAAGACCTACTTCCCCCACTTCGA&lt;br /&gt;
 CCTTTCGCCTGGCTCTGACCAGGTCCGTGGCCATGGCAAGAAGGTGTTGGGTGCCCTGGGCAACGCCGTG&lt;br /&gt;
 AAGAACGTGGACAACCTCAGCCAGGCCATGGCTGAGCTGAGCAACCTGCATGCCTACAACCTGCGTGTTG&lt;br /&gt;
 ACCCCGTCAATTTCAAGCTGTTGTCGCAGTGCATCCAGGTGGTGCTGGCTGTACACATGGGCAAAGACTA&lt;br /&gt;
 CACCCCTGAAGTGCATGCTGCCTTCGACAAGTTCCTGTCTGCCGTGTCTGCTGTGCTGGCTGAGAAGTAC&lt;br /&gt;
 AGATAA&lt;br /&gt;
 &amp;gt;Chicken_alpha-A&lt;br /&gt;
 ATGGTGCTGTCCGCTGCTGACAAGAACAACGTCAAGGGCATCTTCACCAAAATCGCCGGCCATGCTGAGG&lt;br /&gt;
 AGTATGGCGCCGAGACCCTGGAAAGGATGTTCACCACCTACCCCCCAACCAAGACCTACTTCCCCCACTT&lt;br /&gt;
 CGATCTGTCACACGGCTCCGCTCAGATCAAGGGGCACGGCAAGAAGGTAGTGGCTGCCTTGATCGAGGCT&lt;br /&gt;
 GCCAACCACATTGATGACATCGCCGGCACCCTCTCCAAGCTCAGCGACCTCCATGCCCACAAGCTCCGCG&lt;br /&gt;
 TGGACCCTGTCAACTTCAAACTCCTGGGCCAATGCTTCCTGGTGGTGGTGGCCATCCACCACCCTGCTGC&lt;br /&gt;
 CCTGACCCCGGAGGTCCATGCTTCCCTGGACAAGTTCTTGTGCGCCGTGGGCACTGTGCTGACCGCCAAG&lt;br /&gt;
 TACCGTTAA&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;NOTICE&amp;#039;&amp;#039;&amp;#039;: &lt;br /&gt;
* It is essential to use SHORT descriptive names. In the ClustalW format alignment, only the first 15 characters of the names are shown, so if you have very long names the output can be hard to read (see also [[Media:GenBank+FASTA_handout_revised.pdf|the FASTA handout from week 2]]). &amp;#039;&amp;#039;Notice that JalView fails&amp;#039;&amp;#039; in a very opaque way &amp;#039;&amp;#039;if names are not &amp;lt;u&amp;gt;unique within the first 15 characters&amp;lt;/u&amp;gt;&amp;#039;&amp;#039; — it simply appends sequences into one long sequence, if it &amp;quot;thinks&amp;quot; they are named identically!&lt;br /&gt;
* Spaces cannot be part of the names in a FASTA file. If there are spaces, only the first word after &amp;quot;&amp;lt;tt&amp;gt;&amp;amp;gt;&amp;lt;/tt&amp;gt;&amp;quot; counts as the name, subsequent words will be comments. If I had used spaces instead of underscore (&amp;quot;&amp;lt;tt&amp;gt;_&amp;lt;/tt&amp;gt;&amp;quot;) in the file above, the names would not have been unique (&amp;quot;duck&amp;quot; would have been used twice, etc.).&lt;br /&gt;
* Be aware that in GenBank entries containing several genes (see [[Media:GenBank+FASTA_handout_revised.pdf|the GenBank handout from week 2]]), the name of the individual gene (CDS) is found within the feature table. When you click on a CDS containing &amp;quot;&amp;lt;tt&amp;gt;/gene_name=XYZ&amp;lt;/tt&amp;gt;&amp;quot; or similar, it is therefore XYZ you need to use as name in your FASTA file, not the collective title for the entire GenBank entry (e.g. &amp;quot;&amp;lt;tt&amp;gt;Alpha-A and Alpha-D genes ...&amp;lt;/tt&amp;gt;&amp;quot; or &amp;quot;&amp;lt;tt&amp;gt;Yeast Chromosome 2&amp;lt;/tt&amp;gt;&amp;quot;). See also [[Media:MultiGeneScreenshot-en.pdf| the screenshot/handout from the exercise]].&lt;br /&gt;
&amp;lt;!-- * The last GenBank entry (&amp;quot;&amp;lt;tt&amp;gt;AF098919&amp;lt;/tt&amp;gt;&amp;quot; - chicken) contains three genes: &amp;quot;&amp;lt;tt&amp;gt;embryonic alpha-type globin pi&amp;lt;/tt&amp;gt;&amp;quot;, &amp;quot;&amp;lt;tt&amp;gt;adult alpha D globin&amp;lt;/tt&amp;gt;&amp;quot; and &amp;quot;&amp;lt;tt&amp;gt;adult alpha A globin&amp;lt;/tt&amp;gt;&amp;quot;. Here, I have chosen to include only the two last ones, since the first one is described as &amp;quot;alpha-type&amp;quot; instead of &amp;quot;alpha&amp;quot;. It is OK to include &amp;quot;embryonic alpha-type globin pi&amp;quot; to avoid discarding too much — if you do, you will see that it stands out as a separate group in the distance tree produced by MAFFT. This is a good indicator that it is something different. You could then optionally go back and discard it, or write a remark about it being separate. --&amp;gt;&lt;br /&gt;
When you build a &amp;quot;real&amp;quot; dataset for a research project, it is often an iterative process, where you 1) collect your data, 2) weed out outliers, 3) run an analysis, and repeat 2) and 3) until you are satisfied with the results.&lt;br /&gt;
&lt;br /&gt;
==Question 2==&lt;br /&gt;
* &amp;quot;&amp;lt;tt&amp;gt;*&amp;lt;/tt&amp;gt;&amp;quot; means that the nucleotides are completely identical in a given position (perfectly conserved).&lt;br /&gt;
* &amp;lt;!-- If the &amp;quot;&amp;lt;tt&amp;gt;alpha-type&amp;lt;/tt&amp;gt;&amp;quot; sequence is not included, --&amp;gt; There is a single stretch of &amp;gt;10 nucleotides (23 to be precise) which is perfectly conserved. Its sequence is &amp;lt;tt&amp;gt;ACCAAGACCTACTTCCCCCACTT&amp;lt;/tt&amp;gt;. &amp;lt;!--If the &amp;quot;&amp;lt;tt&amp;gt;alpha-type&amp;lt;/tt&amp;gt;&amp;quot; sequence is included, the perfectly conserved stretch is only 11 bases long.--&amp;gt;&lt;br /&gt;
* Concerning &amp;quot;guide tree&amp;quot;:&lt;br /&gt;
** 3 clusters&amp;lt;!-- (+ an &amp;quot;outgroup&amp;quot; if the &amp;quot;&amp;lt;tt&amp;gt;alpha-type&amp;lt;/tt&amp;gt;&amp;quot; sequence is included) --&amp;gt;: One for Alpha-A (birds only), one for Alpha-D  (birds only), and one for Alpha 1 + Alpha 2 (Mammals).&lt;br /&gt;
** The idea is here that birds and mammals are not intermixed, so they are &amp;quot;naturally&amp;quot; placed in a taxonomical sense.&lt;br /&gt;
** Alpha-A and Alpha-D are obviously in two different clusters — that must necessarily mean that the split between them is old. Since both Alpha-A and Alpha-D exist in all the three birds we included, the split must be older than the last common ancestor to the birds.&lt;br /&gt;
** Alpha-1 and Alpha-2 seem to be much more closely related. Remember that a guide tree is only a raw estimate of the phylogeny, so if we want to dig deeper into the time of the split between Alpha-1 and Alpha-2, we need to perform a proper phylogenetic analysis.&lt;br /&gt;
Your screenshot of the 3&amp;#039; part of the alignment should look something like this:&lt;br /&gt;
[[File:Jalview_Q2c.png]]&lt;br /&gt;
&lt;br /&gt;
==Question 3==&lt;br /&gt;
The sequences are translated using Virtual Ribosome, giving rise to the following FASTA file:&lt;br /&gt;
&lt;br /&gt;
 &amp;gt;pigeon_alpha-D-globin&lt;br /&gt;
 MLTDSDKKLVLQVWEKVIRHPDCGAEALERLFTTYPQTKTYFPHFDLHHGSDQVRNHGKK&lt;br /&gt;
 VLAALGNAVKSLGNLSQALSDLSDLHAYNLRVDPVNFKLLAQCFHVVLATHLGNDYTPEA&lt;br /&gt;
 HAAFDKFLSAVCTVLAEKYR*&lt;br /&gt;
 &amp;gt;pigeon_alpha-A-globin&lt;br /&gt;
 MVLSANDKSNVKAVFGKIGGQAGDLGGEALERLFITYPQTKTYFPHFDLSHGSAQIKGHG&lt;br /&gt;
 KKVAEALVEAANHIDDIAGALSKLSDLHAQKLRVDPVNFKLLGHCFLVVVAVHFPSLLTP&lt;br /&gt;
 EVHASLDKFVCAVGTVLTAKYR*&lt;br /&gt;
 &amp;gt;duck_alpha-D-globin&lt;br /&gt;
 MLTAEDKKLIVQVWEKVAGHQEEFGSEALQRMFLAYPQTKTYFPHFDLHPGSEQVRGHGK&lt;br /&gt;
 KVAAALGNAVKSLDNLSQALSELSNLHAYNLRVDPVNFKLLAQCFQVVLAAHLGKDYSPE&lt;br /&gt;
 MHAAFDKFLSAVAAVLAEKYR*&lt;br /&gt;
 &amp;gt;duck_alpha-A-globin&lt;br /&gt;
 MVLSAADKTNVKGVFSKIGGHAEEYGAETLERMFIAYPQTKTYFPHFDLQHGSAQIKAHG&lt;br /&gt;
 KKVAAALVEAVNHIDDIAGALSKLSDLHAQKLRVDPVNFKFLGHCFLVVVAIHHPAALTP&lt;br /&gt;
 EVHASLDKFMCAVGAVLTAKYR*&lt;br /&gt;
 &amp;gt;Goat_alpha-i-globin&lt;br /&gt;
 MVLSAADKSNVKAAWGKVGGNAGAYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG&lt;br /&gt;
 EKVAAALTKAVGHLDDLPGTLSDLSDLHAHKLRVDPVNFKLLSHSLLVTLACHLPNDFTP&lt;br /&gt;
 AVHASLDKFLANVSTVLTSKYR*&lt;br /&gt;
 &amp;gt;Goat_alpha-ii-globin&lt;br /&gt;
 MVLSAADKSNVKAAWGKVGSNAGAYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG&lt;br /&gt;
 EKVAAALTKAVGHLDDLPGTLSDLSDLHAHKLRVDPVNFKLLSHSLLVTLACHHPSDFTP&lt;br /&gt;
 AVHASLDKFLANVSTVLTSKYR*&lt;br /&gt;
 &amp;gt;Horse_alpha-1_globin&lt;br /&gt;
 MVLSAADKTNVKAAWSKVGGHAGEFGAEALERMFLGFPTTKTYFPHFDLSHGSAQVKAHG&lt;br /&gt;
 KKVGDALTLAVGHLDDLPGALSNLSDLHAHKLRVDPVNFKLLSHCLLSTLAVHLPNDFTP&lt;br /&gt;
 AVHASLDKFLSSVSTVLTSKYR*&lt;br /&gt;
 &amp;gt;Horse_alpha-2_globin&lt;br /&gt;
 MVLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHFDLSHGSAQVKAHG&lt;br /&gt;
 QKVGDALTLAVGHLDDLPGALSNLSDLHAHKLRVDPVNFKLLSHCLLSTLAVHLPNDFTP&lt;br /&gt;
 AVHASLDKFLSSVSTVLTSKYR*&lt;br /&gt;
 &amp;gt;Chicken_alpha-D&lt;br /&gt;
 MLTAEDKKLIQQAWERAASHQEEFGAEALTRMFTTYPQTKTYFPHFDLSPGSDQVRGHGK&lt;br /&gt;
 KVLGALGNAVKNVDNLSQAMAELSNLHAYNLRVDPVNFKLLSQCIQVVLAVHMGKDYTPE&lt;br /&gt;
 VHAAFDKFLSAVSAVLAEKYR*&lt;br /&gt;
 &amp;gt;Chicken_alpha-A&lt;br /&gt;
 MVLSAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYPPTKTYFPHFDLSHGSAQIKGHG&lt;br /&gt;
 KKVVAALIEAANHIDDIAGTLSKLSDLHAHKLRVDPVNFKLLGQCFLVVVAIHHPAALTP&lt;br /&gt;
 EVHASLDKFLCAVGTVLTAKYR*&lt;br /&gt;
&lt;br /&gt;
Subsequently, they are aligned with MAFFT.&lt;br /&gt;
&lt;br /&gt;
Observations:&lt;br /&gt;
* By and large the same tree on protein level as on DNA level (small differences in the branch lengths).&lt;br /&gt;
* Now, two completely conserved regions of &amp;gt;5 amino acids are seen. Their sequences are &amp;lt;tt&amp;gt;TKTYFPHFDL&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;LRVDPVNFK&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Question 4==&lt;br /&gt;
FASTA file:&lt;br /&gt;
&lt;br /&gt;
 &amp;gt;Sheep_U00659&lt;br /&gt;
 ATGGCCCTGTGGACACGCCTGGTGCCCCTGCTGGCCCTGCTGGCACTCTGGGCCCCCGCC&lt;br /&gt;
 CCGGCCCACGCCTTCGTCAACCAGCACCTGTGCGGCTCCCACCTGGTGGAGGCGCTGTAC&lt;br /&gt;
 CTGGTGTGCGGAGAGCGCGGCTTCTTCTACACGCCCAAGGCCCGCCGGGAGGTGGAGGGC&lt;br /&gt;
 CCCCAGGTGGGGGCGCTGGAGCTGGCCGGAGGCCCCGGCGCGGGTGGCCTGGAGGGGCCC&lt;br /&gt;
 CCGCAGAAGCGTGGCATCGTGGAGCAGTGCTGCGCCGGCGTCTGCTCTCTCTACCAGCTG&lt;br /&gt;
 GAGAACTACTGTAACTAG&lt;br /&gt;
 &amp;gt;Pig_AY044828&lt;br /&gt;
 ATGGCCCTGTGGACGCGCCTCCTGCCCCTGCTGGCCCTGCTGGCCCTCTGGGCGCCCGCC&lt;br /&gt;
 CCGGCCCAGGCCTTCGTGAACCAGCACCTGTGCGGCTCCCACCTGGTGGAGGCGCTGTAC&lt;br /&gt;
 CTGGTGTGCGGGGAGCGCGGCTTCTTCTACACGCCCAAGGCCCGTCGGGAGGCGGAGAAC&lt;br /&gt;
 CCTCAGGCAGGTGCCGTGGAGCTGGGCGGAGGCCTGGGCGGCCTGCAGGCCCTGGCGCTG&lt;br /&gt;
 GAGGGGCCCCCGCAGAAGCGTGGCATCGTGGAGCAGTGCTGCACCAGCATCTGTTCCCTC&lt;br /&gt;
 TACCAGCTGGAGAACTACTGCAACTAG&lt;br /&gt;
 &amp;gt;Pig_AY242098&lt;br /&gt;
 ATGGCCCTGTGGACGCGCCTCCTGCCCCTGCTGGCCCTGCTGGCCCTCTGGGCGCCCGCC&lt;br /&gt;
 CCGGCCCAGGCCTTCGTGAACCAGCACCTGTGCGGCTCCCACCTGGTGGAGGCGCTGTAC&lt;br /&gt;
 CTGGTGTGCGGGGAGCGCGGCTTCTTCTACACGCCCAAGGCCCGTCGGGAGGCGGAGAAC&lt;br /&gt;
 CCTCAGGCAGGTGCCGTGGAGCTGGGCGGAGGCCTGGGCGGCCTGCAGGCCCTGGCGCTG&lt;br /&gt;
 GAGGGGCCCCCGCAGAAGCGTGGCATCGTGGAGCAGTGCTGCACCAGCATCTGTTCCCTC&lt;br /&gt;
 TACCAGCTGGAGAACTACTGCAACTAG&lt;br /&gt;
 &amp;gt;Pig_AY242100&lt;br /&gt;
 ATGGCCCTGTGGACGCGCCTCCTGCCCCTGCTGGCCCTGCTGGCGCTCTGGGCGCCCGCC&lt;br /&gt;
 CCGGCCCAGGCCTTCGTGAACCAGCACCTGTGCGGCTCCCACCTGGTGGAGGCGCTGTAC&lt;br /&gt;
 CTGGTGTGCGGGGAGCGCGGCTTCTTCTACACGCCCAAGGCCCGTCGGGAGGCGGAGAAC&lt;br /&gt;
 CCTCAGGCAGGTGCCGTGGAGCTGGGCGGAGGCCTGGGCGGCCTGCAGGCCCTGGCGCTG&lt;br /&gt;
 GAGGGGCCCCCGCAGAAGCGTGGCATCGTGGAGCAGTGCTGCACCAGCATCTGTTCCCTC&lt;br /&gt;
 TACCAGCTGGAGAACTACTGCAACTAG&lt;br /&gt;
 &amp;gt;Pig_AY242101&lt;br /&gt;
 ATGGCCCTGTGGACGCGCCTCCTGCCCCTGCTGGCCCTGCTGGCGCTCTGGGCGCCCGCC&lt;br /&gt;
 CCGGCCCAGGCCTTCGTGAACCAGCACCTGTGCGGCTCCCACCTGGTGGAGGCGCTGTAC&lt;br /&gt;
 CTGGTGTGCGGGGAGCGCGGCTTCTTCTACACGCCCAAGGCCCGTCGGGAGGCGGAGAAC&lt;br /&gt;
 CCTCAGGCAGGTGCCGTGGAGCTGGGCGGAGGCCTGGGCGGCCTGCAGGCCCTGGCGCTG&lt;br /&gt;
 GAGGGGCCCCCGCAGAAGCGTGGCATCGTGGAGCAGTGCTGCACCAGCATCTGTTCCCTC&lt;br /&gt;
 TACCAGCTGGAGAACTACTGCAACTAG&lt;br /&gt;
 &amp;gt;Pig_AY242109&lt;br /&gt;
 ATGGCCCTGTGGACGCGCCTCCTGCCCCTGCTGGCCCTGCTGGCGCTCTGGGCGCCCGCC&lt;br /&gt;
 CCGGCCCAGGCCTTCGTGAACCAGCACCTGTGCGGCTCCCACCTGGTGGAGGCGCTGTAC&lt;br /&gt;
 CTGGTGTGCGGGGAGCGCGGCTTCTTCTACACGCCCAAGGCCCGTCGGGAGGCGGAGAAC&lt;br /&gt;
 CCTCAGGCAGGTGCCGTGGAGCTGGGCGGAGGCCTGGGCGGCCTGCAGGCCCTGGCGCTG&lt;br /&gt;
 GAGGGGCCCCCGCAGAAGCGTGGCATCGTAGAGCAGTGCTGCACCAGCATCTGTTCCCTC&lt;br /&gt;
 TACCAGCTGGAGAACTACTGCAACTAG&lt;br /&gt;
 &amp;gt;Dog_V00179&lt;br /&gt;
 ATGGCCCTCTGGATGCGCCTCCTGCCCCTGCTGGCCCTGCTGGCCCTCTGGGCGCCCGCG&lt;br /&gt;
 CCCACCCGAGCCTTCGTTAACCAGCACCTGTGTGGCTCCCACCTGGTAGAGGCTCTGTAC&lt;br /&gt;
 CTGGTGTGCGGGGAGCGCGGCTTCTTCTACACGCCTAAGGCCCGCAGGGAGGTGGAGGAC&lt;br /&gt;
 CTGCAGGTGAGGGACGTGGAGCTGGCCGGGGCGCCTGGCGAGGGCGGCCTGCAGCCCCTG&lt;br /&gt;
 GCCCTGGAGGGGGCCCTGCAGAAGCGAGGCATCGTGGAGCAGTGCTGCACCAGCATCTGC&lt;br /&gt;
 TCCCTCTACCAGCTGGAGAATTACTGCAACTAG&lt;br /&gt;
 &amp;gt;OwlMonkey_J02989&lt;br /&gt;
 ATGGCCCTGTGGATGCACCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCCGAG&lt;br /&gt;
 CCAGCCCCGGCCTTTGTGAACCAGCACCTGTGCGGCCCCCACCTGGTGGAAGCCCTCTAC&lt;br /&gt;
 CTGGTGTGCGGGGAGCGAGGTTTCTTCTACGCACCCAAGACCCGCCGGGAGGCGGAGGAC&lt;br /&gt;
 CTGCAGGTGGGGCAGGTGGAGCTGGGTGGGGGCTCTATCACGGGCAGCCTGCCACCCTTG&lt;br /&gt;
 GAGGGTCCCATGCAGAAGCGTGGCGTCGTGGATCAGTGCTGCACCAGCATCTGCTCCCTC&lt;br /&gt;
 TACCAGCTGCAGAACTACTGCAACTAG&lt;br /&gt;
 &amp;gt;Human_AY138590&lt;br /&gt;
 ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGAC&lt;br /&gt;
 CCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTAC&lt;br /&gt;
 CTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGAC&lt;br /&gt;
 CTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTG&lt;br /&gt;
 GCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGC&lt;br /&gt;
 TCCCTCTACCAGCTGGAGAACTACTGCAACTAG&lt;br /&gt;
 &amp;gt;GreenMonkey_X61092&lt;br /&gt;
 ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGAC&lt;br /&gt;
 CCGGTCCCGGCCTTTGTGAACCAGCACCTGTGCGGCTCCCACCTGGTGGAAGCCCTCTAC&lt;br /&gt;
 CTGGTGTGCGGGGAGCGAGGCTTCTTCTACACGCCCAAGACCCGCCGGGAGGCAGAGGAC&lt;br /&gt;
 CCGCAGGTGGGGCAGGTAGAGCTGGGCGGGGGCCCTGGCGCAGGCAGCCTGCAGCCCTTG&lt;br /&gt;
 GCGCTGGAGGGGTCCCTGCAGAAGCGCGGCATCGTGGAGCAGTGCTGTACCAGCATCTGC&lt;br /&gt;
 TCCCTCTACCAGCTGGAGAACTACTGCAACTAG&lt;br /&gt;
 &amp;gt;Human_J00265&lt;br /&gt;
 ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGAC&lt;br /&gt;
 CCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTAC&lt;br /&gt;
 CTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGAC&lt;br /&gt;
 CTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTG&lt;br /&gt;
 GCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGC&lt;br /&gt;
 TCCCTCTACCAGCTGGAGAACTACTGCAACTAG&lt;br /&gt;
 &amp;gt;Chimp_X61089&lt;br /&gt;
 ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGTGCTGCTGGCCCTCTGGGGACCTGAC&lt;br /&gt;
 CCAGCCTCGGCCTTTGTGAACCAACACCTGTGCGGCTCCCACCTGGTGGAAGCTCTCTAC&lt;br /&gt;
 CTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGAC&lt;br /&gt;
 CTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTG&lt;br /&gt;
 GCCCTGGAGGGGTCCCTGCAGAAGCGTGGTATCGTGGAACAATGCTGTACCAGCATCTGC&lt;br /&gt;
 TCCCTCTACCAGCTGGAGAACTACTGCAACTAG&lt;br /&gt;
 &amp;gt;GuineaPig_K02233&lt;br /&gt;
 ATGGCTCTGTGGATGCATCTCCTCACCGTGCTGGCCCTGCTGGCCCTCTGGGGGCCCAAC&lt;br /&gt;
 ACTAATCAGGCCTTTGTCAGCCGGCATCTGTGCGGCTCCAACTTAGTGGAGACATTGTAT&lt;br /&gt;
 TCAGTGTGTCAGGATGATGGCTTCTTCTATATACCCAAGGACCGTCGGGAGCTAGAGGAC&lt;br /&gt;
 CCACAGGTGGAGCAGACAGAACTGGGCATGGGCCTGGGGGCAGGTGGACTACAGCCCTTG&lt;br /&gt;
 GCACTGGAGATGGCACTACAGAAGCGTGGCATTGTGGATCAGTGCTGTACTGGCACCTGC&lt;br /&gt;
 ACACGCCACCAGCTGCAGAGCTACTGCAACTAG&lt;br /&gt;
 &amp;gt;Mouse_X04725&lt;br /&gt;
 ATGGCCCTGTTGGTGCACTTCCTACCCCTGCTGGCCCTGCTTGCCCTCTGGGAGCCCAAA&lt;br /&gt;
 CCCACCCAGGCTTTTGTCAAACAGCATCTTTGTGGTCCCCACCTGGTAGAGGCTCTCTAC&lt;br /&gt;
 CTGGTGTGTGGGGAGCGTGGCTTCTTCTACACACCCAAGTCCCGCCGTGAAGTGGAGGAC&lt;br /&gt;
 CCACAAGTGGAACAACTGGAGCTGGGAGGAAGCCCCGGGGACCTTCAGACCTTGGCGTTG&lt;br /&gt;
 GAGGTGGCCCGGCAGAAGCGTGGCATTGTGGATCAGTGCTGCACCAGCATCTGCTCCCTC&lt;br /&gt;
 TACCAGCTGGAGAACTACTGCAACTAA&lt;br /&gt;
 &amp;gt;Chicken_AY438372&lt;br /&gt;
 ATGGCTCTCTGGATCCGATCACTGCCTCTTCTGGCTCTCCTTGTCTTTTCTGGCCCTGGA&lt;br /&gt;
 ACCAGCTATGCAGCTGCCAACCAGCACCTCTGTGGCTCCCACTTGGTGGAGGCTCTCTAC&lt;br /&gt;
 CTGGTGTGTGGAGAGCGTGGCTTCTTCTACTCCCCCAAAGCCCGACGGGATGTCGAGCAG&lt;br /&gt;
 CCCCTAGTGAGCAGTCCCTTGCGTGGCGAGGCAGGAGTGCTGCCTTTCCAGCAGGAGGAA&lt;br /&gt;
 TACGAGAAAGTCAAGCGAGGGATTGTTGAGCAATGCTGCCATAACACGTGTTCCCTCTAC&lt;br /&gt;
 CAACTGGAGAACTACTGCAACTAG&lt;br /&gt;
 &amp;gt;SeaHare_AF160192&lt;br /&gt;
 ATGAGCAAGTTCCTCCTCCAGAGCCACTCCGCCAACGCCTGCCTGCTCACCCTTCTGCTCACGCTGGCCT&lt;br /&gt;
 CCAACCTCGACATATCCCTGGCCAACTTCGAGCACTCGTGCAACGGCTACATGCGGCCCCACCCGCGGGG&lt;br /&gt;
 TCTGTGCGGCGAAGACCTGCACGTCATCATTTCCAACCTGTGCAGCTCTCTGGGGGGCAACAGGAGGTTC&lt;br /&gt;
 CTGGCCAAGTACATGGTCAAAAGAGACACGGAAAATGTGAACGACAAGTTACGAGGGATCCTGCTCAATA&lt;br /&gt;
 AGAAAGAAGCTTTCTCCTACTTGACCAAGAGAGAGGCCTCAGGCTCCATCACATGCGAATGTTGCTTCAA&lt;br /&gt;
 CCAGTGTCGGATATTTGAGCTGGCTCAGTACTGCCGTCTGCCAGACCATTTCTTCTCCAGAATATCCAGA&lt;br /&gt;
 ACCGGAAGGAGCAACAGTGGACATGCGCAGTTGGAGGACAACTTTAGTTA&lt;br /&gt;
&lt;br /&gt;
==Question 5==&lt;br /&gt;
* Yes, there are many gaps which are not multiples of 3 positions. The most obvious example is just 1 position long (in all sequences but the Sea Hare, see below). Otherwise, it does &amp;#039;&amp;#039;not&amp;#039;&amp;#039; look like all gaps follow codon boundaries, &amp;#039;&amp;#039;e.g.&amp;#039;&amp;#039; the first gap starts after four nucleotides, not three. The alignment algorithm is &amp;#039;&amp;#039;not&amp;#039;&amp;#039; aware that the sequences are protein coding, it only considers the DNA.&lt;br /&gt;
&lt;br /&gt;
 Sheep_U00659    ATCGTGGAGC-AGTGCTGCGCCGGCGTCTGC--------TCTCTCTAC------------&lt;br /&gt;
 Pig_AY044828    ATCGTGGAGC-AGTGCTGCACCAGCATCTGT--------TCCCTCTAC------------&lt;br /&gt;
 Pig_AY242098    ATCGTGGAGC-AGTGCTGCACCAGCATCTGT--------TCCCTCTAC------------&lt;br /&gt;
 Pig_AY242100    ATCGTGGAGC-AGTGCTGCACCAGCATCTGT--------TCCCTCTAC------------&lt;br /&gt;
 Pig_AY242101    ATCGTGGAGC-AGTGCTGCACCAGCATCTGT--------TCCCTCTAC------------&lt;br /&gt;
 Pig_AY242109    ATCGTAGAGC-AGTGCTGCACCAGCATCTGT--------TCCCTCTAC------------&lt;br /&gt;
 OwlMonkey_J0298 GTCGTGGATC-AGTGCTGCACCAGCATCTGC--------TCCCTCTAC------------&lt;br /&gt;
 Human_AY138590  ATTGTGGAAC-AATGCTGTACCAGCATCTGC--------TCCCTCTAC------------&lt;br /&gt;
 Human_J00265    ATTGTGGAAC-AATGCTGTACCAGCATCTGC--------TCCCTCTAC------------&lt;br /&gt;
 Chimp_X61089    ATCGTGGAAC-AATGCTGTACCAGCATCTGC--------TCCCTCTAC------------&lt;br /&gt;
 GreenMonkey_X61 ATCGTGGAGC-AGTGCTGTACCAGCATCTGC--------TCCCTCTAC------------&lt;br /&gt;
 Dog_V00179      ATCGTGGAGC-AGTGCTGCACCAGCATCTGC--------TCCCTCTAC------------&lt;br /&gt;
 Mouse_X04725    ATTGTGGATC-AGTGCTGCACCAGCATCTGC--------TCCCTCTAC------------&lt;br /&gt;
 GuineaPig_K0223 ATTGTGGATC-AGTGCTGTACTGGCACCTGC--------ACACGCCAC------------&lt;br /&gt;
 Chicken_AY43837 ATTGTTGAGC-AATGCTGCCATAACACGTGT--------TCCCTCTAC------------&lt;br /&gt;
 SeaHare_AF16019 ATATTTGAGCTGGCTCAGTACTGCCGTCTGCCAGACCATTTCTTCTCCAGAATATCCAGA&lt;br /&gt;
                 .*  * ** * ... * *.  .. *.. **.         . . *. *            &lt;br /&gt;
&lt;br /&gt;
* Sea Hare (a marine snail) stands out — this makes sense, since it is the only invertebrate.&lt;br /&gt;
&lt;br /&gt;
* It can be seen that the two human sequences are 100% identical (the distance is 0) — one of them can therefore be discarded — and for the pig, the following sequences are identical:&lt;br /&gt;
&lt;br /&gt;
 &amp;gt;Pig_AY044828&lt;br /&gt;
 &amp;gt;Pig_AY242098&lt;br /&gt;
&lt;br /&gt;
and&lt;br /&gt;
&lt;br /&gt;
 &amp;gt;Pig_AY242100&lt;br /&gt;
 &amp;gt;Pig_AY242101&lt;br /&gt;
&lt;br /&gt;
(two pig sequences can therefore be discarded).&lt;br /&gt;
&lt;br /&gt;
==Question 6==&lt;br /&gt;
The sequences are translated using Virtual Ribosome, yielding the following sequences:&lt;br /&gt;
&lt;br /&gt;
 &amp;gt;Sheep_U00659&lt;br /&gt;
 MALWTRLVPLLALLALWAPAPAHAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVEG&lt;br /&gt;
 PQVGALELAGGPGAGGLEGPPQKRGIVEQCCAGVCSLYQLENYCN*&lt;br /&gt;
 &amp;gt;Pig_AY044828&lt;br /&gt;
 MALWTRLLPLLALLALWAPAPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREAEN&lt;br /&gt;
 PQAGAVELGGGLGGLQALALEGPPQKRGIVEQCCTSICSLYQLENYCN*&lt;br /&gt;
 &amp;gt;Pig_AY242098&lt;br /&gt;
 MALWTRLLPLLALLALWAPAPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREAEN&lt;br /&gt;
 PQAGAVELGGGLGGLQALALEGPPQKRGIVEQCCTSICSLYQLENYCN*&lt;br /&gt;
 &amp;gt;Pig_AY242100&lt;br /&gt;
 MALWTRLLPLLALLALWAPAPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREAEN&lt;br /&gt;
 PQAGAVELGGGLGGLQALALEGPPQKRGIVEQCCTSICSLYQLENYCN*&lt;br /&gt;
 &amp;gt;Pig_AY242101&lt;br /&gt;
 MALWTRLLPLLALLALWAPAPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREAEN&lt;br /&gt;
 PQAGAVELGGGLGGLQALALEGPPQKRGIVEQCCTSICSLYQLENYCN*&lt;br /&gt;
 &amp;gt;Pig_AY242109&lt;br /&gt;
 MALWTRLLPLLALLALWAPAPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREAEN&lt;br /&gt;
 PQAGAVELGGGLGGLQALALEGPPQKRGIVEQCCTSICSLYQLENYCN*&lt;br /&gt;
 &amp;gt;Dog_V00179&lt;br /&gt;
 MALWMRLLPLLALLALWAPAPTRAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVED&lt;br /&gt;
 LQVRDVELAGAPGEGGLQPLALEGALQKRGIVEQCCTSICSLYQLENYCN*&lt;br /&gt;
 &amp;gt;OwlMonkey_J02989&lt;br /&gt;
 MALWMHLLPLLALLALWGPEPAPAFVNQHLCGPHLVEALYLVCGERGFFYAPKTRREAED&lt;br /&gt;
 LQVGQVELGGGSITGSLPPLEGPMQKRGVVDQCCTSICSLYQLQNYCN*&lt;br /&gt;
 &amp;gt;Human_AY138590&lt;br /&gt;
 MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED&lt;br /&gt;
 LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN*&lt;br /&gt;
 &amp;gt;GreenMonkey_X61092&lt;br /&gt;
 MALWMRLLPLLALLALWGPDPVPAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED&lt;br /&gt;
 PQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN*&lt;br /&gt;
 &amp;gt;Human_J00265&lt;br /&gt;
 MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED&lt;br /&gt;
 LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN*&lt;br /&gt;
 &amp;gt;Chimp_X61089&lt;br /&gt;
 MALWMRLLPLLVLLALWGPDPASAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED&lt;br /&gt;
 LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN*&lt;br /&gt;
 &amp;gt;GuineaPig_K02233&lt;br /&gt;
 MALWMHLLTVLALLALWGPNTNQAFVSRHLCGSNLVETLYSVCQDDGFFYIPKDRRELED&lt;br /&gt;
 PQVEQTELGMGLGAGGLQPLALEMALQKRGIVDQCCTGTCTRHQLQSYCN*&lt;br /&gt;
 &amp;gt;Mouse_X04725&lt;br /&gt;
 MALLVHFLPLLALLALWEPKPTQAFVKQHLCGPHLVEALYLVCGERGFFYTPKSRREVED&lt;br /&gt;
 PQVEQLELGGSPGDLQTLALEVARQKRGIVDQCCTSICSLYQLENYCN*&lt;br /&gt;
 &amp;gt;Chicken_AY438372&lt;br /&gt;
 MALWIRSLPLLALLVFSGPGTSYAAANQHLCGSHLVEALYLVCGERGFFYSPKARRDVEQ&lt;br /&gt;
 PLVSSPLRGEAGVLPFQQEEYEKVKRGIVEQCCHNTCSLYQLENYCN*&lt;br /&gt;
 &amp;gt;SeaHare_AF160192&lt;br /&gt;
 MSKFLLQSHSANACLLTLLLTLASNLDISLANFEHSCNGYMRPHPRGLCGEDLHVIISNL&lt;br /&gt;
 CSSLGGNRRFLAKYMVKRDTENVNDKLRGILLNKKEAFSYLTKREASGSITCECCFNQCR&lt;br /&gt;
 IFELAQYCRLPDHFFSRISRTGRSNSGHAQLEDNFS*&lt;br /&gt;
&lt;br /&gt;
Subsequently, the sequences are aligned using MAFFT.&lt;br /&gt;
&lt;br /&gt;
* At the protein level, all the Pig sequences are now completely identical. Four of them can therefore be discarded.&lt;br /&gt;
&lt;br /&gt;
==Question 7==&lt;br /&gt;
&lt;br /&gt;
Yes, the alignments are different. None of the three methods solves the problem perfectly, but MAFFT is really close; it only places one letter (a Q) incorrectly, see below.&lt;br /&gt;
&lt;br /&gt;
[[File:Q7-MAFFT.png|Portion of the MAFFT alignment with Zappo colouring, note the three Q&amp;#039;s aligned with E&amp;#039;s at position 446.]]&lt;br /&gt;
&lt;br /&gt;
==Question 8==&lt;br /&gt;
* Yes — all gaps are multiples of 3.&lt;br /&gt;
* Yes — since the DNA alignment is generated using a protein alignment as a scaffold.&lt;br /&gt;
&amp;lt;!-- * Yes — there are some short stretches of bases in lower case. --&amp;gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
</feed>