<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://teaching.healthtech.dtu.dk/22111/index.php?action=history&amp;feed=atom&amp;title=ExBlast-Answers2</id>
	<title>ExBlast-Answers2 - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://teaching.healthtech.dtu.dk/22111/index.php?action=history&amp;feed=atom&amp;title=ExBlast-Answers2"/>
	<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22111/index.php?title=ExBlast-Answers2&amp;action=history"/>
	<updated>2026-05-02T20:21:17Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22111/index.php?title=ExBlast-Answers2&amp;diff=444&amp;oldid=prev</id>
		<title>Henni: /* QUESTION 3.3 */</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22111/index.php?title=ExBlast-Answers2&amp;diff=444&amp;oldid=prev"/>
		<updated>2024-11-15T15:17:07Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;QUESTION 3.3&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 17:17, 15 November 2024&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l206&quot;&gt;Line 206:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 206:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;STEP 1 - cleaning up the sequence: &amp;#039;&amp;#039;&amp;#039;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;STEP 1 - cleaning up the sequence: &amp;#039;&amp;#039;&amp;#039;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;*&#039;&#039;Subquestion: convert the sequence to FASTA format (manually, in &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;JEdit&lt;/del&gt;) and quote it in your report.&#039;&#039;  &lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;*&#039;&#039;Subquestion: convert the sequence to FASTA format (manually, in &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Geany&lt;/ins&gt;) and quote it in your report.&#039;&#039;  &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;  &amp;gt;CLONE12&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;  &amp;gt;CLONE12&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Henni</name></author>
	</entry>
	<entry>
		<id>https://teaching.healthtech.dtu.dk/22111/index.php?title=ExBlast-Answers2&amp;diff=221&amp;oldid=prev</id>
		<title>WikiSysop: Created page with &quot;Answers to the BLAST exercise, by Henrik Nielsen. Values for database sizes etc. retrieved March 7, 2020   ==Part 1: Your first BLAST search==  ===QUESTION 1.1=== * &#039;&#039;what is the identifier (Accession)?&#039;&#039; :OL351605 or M57671 (Note that the latter was also part of the sequence name for your query sequence!) * &#039;&#039;what is the alignment score (&quot;Max score in bits&quot;)?&#039;&#039; :The max score is 780 bits (Raw score is 864) * &#039;&#039;what is the percent identity and query coverage?&#039;&#039; :100% * &#039;...&quot;</title>
		<link rel="alternate" type="text/html" href="https://teaching.healthtech.dtu.dk/22111/index.php?title=ExBlast-Answers2&amp;diff=221&amp;oldid=prev"/>
		<updated>2024-03-15T11:14:51Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;Answers to the BLAST exercise, by Henrik Nielsen. Values for database sizes etc. retrieved March 7, 2020   ==Part 1: Your first BLAST search==  ===QUESTION 1.1=== * &amp;#039;&amp;#039;what is the identifier (Accession)?&amp;#039;&amp;#039; :OL351605 or M57671 (Note that the latter was also part of the sequence name for your query sequence!) * &amp;#039;&amp;#039;what is the alignment score (&amp;quot;Max score in bits&amp;quot;)?&amp;#039;&amp;#039; :The max score is 780 bits (Raw score is 864) * &amp;#039;&amp;#039;what is the percent identity and query coverage?&amp;#039;&amp;#039; :100% * &amp;#039;...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;Answers to the BLAST exercise, by Henrik Nielsen. Values for database sizes etc. retrieved March 7, 2020&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Part 1: Your first BLAST search==&lt;br /&gt;
&lt;br /&gt;
===QUESTION 1.1===&lt;br /&gt;
* &amp;#039;&amp;#039;what is the identifier (Accession)?&amp;#039;&amp;#039;&lt;br /&gt;
:OL351605 or M57671 (Note that the latter was also part of the sequence name for your query sequence!)&lt;br /&gt;
* &amp;#039;&amp;#039;what is the alignment score (&amp;quot;Max score in bits&amp;quot;)?&amp;#039;&amp;#039;&lt;br /&gt;
:The max score is 780 bits (Raw score is 864)&lt;br /&gt;
* &amp;#039;&amp;#039;what is the percent identity and query coverage?&amp;#039;&amp;#039;&lt;br /&gt;
:100%&lt;br /&gt;
* &amp;#039;&amp;#039;what is the E-value?&amp;#039;&amp;#039;&lt;br /&gt;
:0.0 (actually, a number so small that it is rounded off to 0.0)&lt;br /&gt;
* &amp;#039;&amp;#039;are there any gaps in the alignment?&amp;#039;&amp;#039;&lt;br /&gt;
:No, of course not, since the sequences are identical&lt;br /&gt;
&lt;br /&gt;
===QUESTION 1.2===&lt;br /&gt;
* &amp;#039;&amp;#039;what is the identifier (Accession)?&amp;#039;&amp;#039;&lt;br /&gt;
:NM_001185098 or NM_001185097 (or a handful more), they have the same score and are therefore equally good&lt;br /&gt;
* &amp;#039;&amp;#039;what is the alignment score (&amp;quot;max score&amp;quot;)?&amp;#039;&amp;#039;&lt;br /&gt;
:205&lt;br /&gt;
* &amp;#039;&amp;#039;what is the percent identity and query coverage?&amp;#039;&amp;#039;&lt;br /&gt;
:identity: 74.49% and query coverage: 76%&lt;br /&gt;
* &amp;#039;&amp;#039;what is the E-value?&amp;#039;&amp;#039;&lt;br /&gt;
:9.77E-48 (meaning 9.77&amp;amp;times;10&amp;lt;sup&amp;gt;-48&amp;lt;/sup&amp;gt;)&lt;br /&gt;
* &amp;#039;&amp;#039;are there any gaps in the alignment?&amp;#039;&amp;#039;&lt;br /&gt;
:Yes, there are five gaps in the query sequence and two gaps in the database sequence, totaling 15 positions.&lt;br /&gt;
&lt;br /&gt;
===QUESTION 1.3===&lt;br /&gt;
* &amp;#039;&amp;#039;what is the identifier (Accession)?&amp;#039;&amp;#039;&lt;br /&gt;
:NM_001185098 or NM_001185097 or NM_000207, they have the same score and are therefore equally good. Note that these are among the equally good hits found in the previous question.&lt;br /&gt;
* &amp;#039;&amp;#039;what is the alignment score (&amp;quot;max score&amp;quot;)?&amp;#039;&amp;#039;&lt;br /&gt;
:205&lt;br /&gt;
* &amp;#039;&amp;#039;what is the percent identity and query coverage?&amp;#039;&amp;#039;&lt;br /&gt;
:identity: 74.49% and query coverage: 76%&lt;br /&gt;
* &amp;#039;&amp;#039;what is the E-value?&amp;#039;&amp;#039;&lt;br /&gt;
:8.16E-51 (meaning 8.61&amp;amp;times;10&amp;lt;sup&amp;gt;-51&amp;lt;/sup&amp;gt;)&lt;br /&gt;
* &amp;#039;&amp;#039;are there any gaps in the alignment?&amp;#039;&amp;#039;&lt;br /&gt;
:Yes, there are exactly the same gaps as in the previous question.&lt;br /&gt;
&lt;br /&gt;
===QUESTION 1.4===&lt;br /&gt;
&amp;#039;&amp;#039;What are the sizes (in basepairs) of the databases we used for the two BLAST searches?&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
nt: 1,347,152,378,063 letters (= basepairs), RefSeq_rna: 1,096,131,797 letters (= basepairs).&lt;br /&gt;
&lt;br /&gt;
===QUESTION 1.5===&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;What is the ratio between the database sizes in the two BLAST searches?&amp;#039;&amp;#039;&lt;br /&gt;
:1347152378063 / 1096131797  = &amp;lt;u&amp;gt;1229&amp;lt;/u&amp;gt;&lt;br /&gt;
*&amp;#039;&amp;#039;What is the ratio between the E-values (for the best human hits) in the two BLAST searches? &amp;#039;&amp;#039;&lt;br /&gt;
:9.77E-48 / 8.16E-51 = &amp;lt;u&amp;gt;1197&amp;lt;/u&amp;gt;&lt;br /&gt;
:Note: since the E-values have only three significant digits, you cannot expect to get the exact same result.&lt;br /&gt;
:Also note, you can google &amp;quot;9.77E-48 / 8.16E-51&amp;quot; directly and the answer will show up in the results.&lt;br /&gt;
*&amp;#039;&amp;#039;What is the relationship between database size and E-value for hits with identical alignment score?&amp;#039;&amp;#039;&lt;br /&gt;
:The E-value is directly proportional to the database size. &lt;br /&gt;
:Note: Conceptually this is easy to understand - getting an alignment with the given score (205 bits) is more SIGNIFICANT in the smaller database. In larger database there is a larger chance of randomly picking up matches.&lt;br /&gt;
*&amp;#039;&amp;#039;In conclusion: if the database size is doubled, what will happen to the E-value?&amp;#039;&amp;#039;&lt;br /&gt;
:Each time the database size doubles, the E-value doubles as well.&lt;br /&gt;
&lt;br /&gt;
==Part 2: Assessing the statistical significance of BLAST hits==&lt;br /&gt;
&lt;br /&gt;
===QUESTION 2.1===&lt;br /&gt;
Report the sequence in &amp;#039;&amp;#039;&amp;#039;FASTA&amp;#039;&amp;#039;&amp;#039; format:&lt;br /&gt;
&lt;br /&gt;
 &amp;gt;random_d_sequence&lt;br /&gt;
 TTCTGAAAGGTCCTCTCGATACTCG &lt;br /&gt;
&lt;br /&gt;
(of course your particular sequences will not be identical to these)&lt;br /&gt;
&lt;br /&gt;
===QUESTION 2.2===&lt;br /&gt;
*&amp;#039;&amp;#039;Do you find any sequences that look like your input sequence (paste in a few example alignments in your report).&amp;#039;&amp;#039;&lt;br /&gt;
:There will typically be several 100% identity hits, &amp;#039;&amp;#039;e.g.&amp;#039;&amp;#039;:&lt;br /&gt;
&lt;br /&gt;
 ****Alignment**** 1&lt;br /&gt;
 Title: gi|2440392781|emb|OX421481.1| Eilema caniola genome assembly, chromosome: 20&lt;br /&gt;
 Accession: OX421481&lt;br /&gt;
 Length: 22119023&lt;br /&gt;
 Max Score: 44.0&lt;br /&gt;
 Bits: 40.9604&lt;br /&gt;
 Identities: 22&lt;br /&gt;
 Align_length: 22&lt;br /&gt;
 Gaps: 0&lt;br /&gt;
 %Ident: 100.00 %&lt;br /&gt;
 Query Cover: 88 %&lt;br /&gt;
 E value: 1.88e+00&lt;br /&gt;
 TTCTGAAAGGTCCTCTCGATAC&lt;br /&gt;
 ||||||||||||||||||||||&lt;br /&gt;
 TTCTGAAAGGTCCTCTCGATAC&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;What is the typical length of the hits (the alignment length)?&amp;#039;&amp;#039;:&lt;br /&gt;
:Typically around 17-22 base pairs.&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;What is the typical % identity?&amp;#039;&amp;#039;:&lt;br /&gt;
:90% - 100%&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;In what range are the bit-scores (&amp;quot;max score)?&amp;#039;&amp;#039;:&lt;br /&gt;
:typically 30-40 bits.&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;What is the range of the E-values?&amp;#039;&amp;#039;:&lt;br /&gt;
:1.88e+00 - 2.29e+01&lt;br /&gt;
:usually varying from 1 to 50 (occasionally, you might find hits as &amp;quot;good&amp;quot; as 0.1).&lt;br /&gt;
:&amp;#039;&amp;#039;&amp;#039;Note&amp;#039;&amp;#039;&amp;#039;: we chose to use an E-value threshold of 50.0. The default is 0.05.&lt;br /&gt;
&lt;br /&gt;
===QUESTION 2.3===&lt;br /&gt;
&amp;#039;&amp;#039;What is the biological significance of the hits you found / is there any biological meaning?&amp;#039;&amp;#039;:&lt;br /&gt;
&lt;br /&gt;
This makes absolutely NO biological sense(!) The hits are real enough as such, they represent sequences that actually are in the database. But we know that our query sequences are completely random and therefore have no evolutionary relationship with the hits. The only reason we found our hits is that the database is so vast that we for for purely stochastic reasons happen upon sequences that are similar.&lt;br /&gt;
&lt;br /&gt;
The E-values tell us precisely this: As described in the BLAST lecture, the alignment score will follow an extreme value distribution for those sequences that are not related to our query sequences, and the E-value is &amp;#039;&amp;#039;the expected number&amp;#039;&amp;#039; of spurious (unrelated) hits with the given alignment score or better, given the database size.&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Note:&amp;#039;&amp;#039;&amp;#039; Don&amp;#039;t be confused by the difference between alignment score and bit score; bit score is simply the alignment score normalized by a constant factor which gives a result expressible in bits.&lt;br /&gt;
&lt;br /&gt;
===QUESTION 2.4===&lt;br /&gt;
&amp;#039;&amp;#039;Report the sequence in FASTA format&amp;#039;&amp;#039;:&lt;br /&gt;
&lt;br /&gt;
 &amp;gt;seq_01&lt;br /&gt;
 LTNNVNMHWTLPYTVSHVYVNPYSC&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
(again, your particular sequence will of course differ from this).&lt;br /&gt;
&lt;br /&gt;
===QUESTION 2.5===&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
*&amp;#039;&amp;#039;How big is the database this time? &amp;#039;&amp;#039;:&lt;br /&gt;
:Number of letters (amino acids): 243,461,748,389&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
*&amp;#039;&amp;#039;What is the typical length of the alignment and do they contain gaps?&amp;#039;&amp;#039;:&lt;br /&gt;
:Typically 15-22. Rarely gaps, but several mismatches.&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;What is the range of E-values?&amp;#039;&amp;#039;:&lt;br /&gt;
:Typically 100-1000 &lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;Try to inspect a few of the alignments in details (&amp;quot;+&amp;quot; means similar) - do you find any that look plausible, if we for a moment ignore the length/E-value?&amp;#039;&amp;#039;&lt;br /&gt;
:Yes, maybe. See &amp;#039;&amp;#039;e.g.&amp;#039;&amp;#039; the alignment below, it has 77% identities (but it is way too short to be significant, as the E-value tells us).&lt;br /&gt;
&lt;br /&gt;
 ****Alignment**** 1&lt;br /&gt;
 Title: ref|WP_179589105.1| non-ribosomal peptide synthetase [Pigmentiphaga litoralis] &amp;gt;gb|NYE25977.1| amino acid adenylation domain-containing protein [Pigmentiphaga litoralis] &amp;gt;gb|NYE85097.1| amino acid adenylation domain- containing protein [Pigmentiphaga litoralis]&lt;br /&gt;
 Accession: WP_179589105&lt;br /&gt;
 Length: 1782&lt;br /&gt;
 Max Score: 67.0&lt;br /&gt;
 Bits: 30.4166&lt;br /&gt;
 Identities: 8&lt;br /&gt;
 Align_length: 22&lt;br /&gt;
 Gaps: 0&lt;br /&gt;
 %Ident: 36.36 %&lt;br /&gt;
 Query Cover: 88 %&lt;br /&gt;
 E value: 1.25e+02&lt;br /&gt;
 LTNNVNMHWTLPYTVSHVYVNP&lt;br /&gt;
 L   ++ HW +P+T+SH++ +P&lt;br /&gt;
 LAARISQHWCVPFTISHIFDHP&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;If we had used the default E-value cutoff of 10 would any hits have been found?&amp;#039;&amp;#039;:&lt;br /&gt;
:No (note: the default is actually 0.05 now). Note the difference from the nucleotide database searches (whose E-values were typically in the range 1-50): if we had run BLASTN with an E-value threshold of 1000, we would have had many pages of hits for each query sequence.&lt;br /&gt;
&lt;br /&gt;
===QUESTION 2.6===&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;If we compare the result from BLAST&amp;#039;ing random DNA sequences to random Peptide sequences - which kind of search has the higher risk of returning false positives (results that appear plausible, maybe even significant, but are truly unrelated)?&amp;#039;&amp;#039;:&lt;br /&gt;
:The risk of getting a false hit (an unrelated sequence with a &amp;quot;decent&amp;quot; E-value) is much larger when working with DNA sequences. Remember than we used 50 as E-value cut-off for BLASTN, while we used 1000 with BLASTP in order to see any hits at all.&lt;br /&gt;
&lt;br /&gt;
==Part 3: using BLAST to transfer functional information by finding homologs==&lt;br /&gt;
&lt;br /&gt;
===QUESTION 3.1===&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;Do we get any significant hits?&amp;#039;&amp;#039;&lt;br /&gt;
:Yes, there are 20 hits with an E-value of &amp;quot;0.0&amp;quot; (&amp;#039;&amp;#039;i.e.&amp;#039;&amp;#039; so small that is is rounded to zero) — and the next hits are also extremely significant. The first hit (S48754) furthermore has a query coverage of 100% and an identity of 100% (this is actually the source of our query).&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;What kind of genes (function) do we find?&amp;#039;&amp;#039;&lt;br /&gt;
:All the high-quality hits are alkaline serine proteases from the genera &amp;#039;&amp;#039;Bacillus&amp;#039;&amp;#039; or &amp;#039;&amp;#039;Alkalihalobacillus&amp;#039;&amp;#039; — except some hits that are whole genome sequences.&lt;br /&gt;
&lt;br /&gt;
===QUESTION 3.2===&lt;br /&gt;
Note 1: remember to use the ORF Finder in Virtual Ribosome! Since we are told the sequence is a full-length transcript, we can assume that the START and STOP codons are included and set the ORF finder to &amp;quot;&amp;lt;u&amp;gt;Start codon: Any&amp;lt;/u&amp;gt;&amp;quot; (in this case, it would have given the same result to use  &amp;quot;&amp;lt;u&amp;gt;Start codon: Strict&amp;lt;/u&amp;gt;&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
Note 2: you can choose the standard genetic code (Table 1) or alternatively Table 11 (&amp;lt;u&amp;gt;Bacterial and Plant Plastid&amp;lt;/u&amp;gt;). The only difference is that Table 11 allows some extra, rarely occurring, start codons.&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;Report your translated protein sequence in FASTA format.&amp;#039;&amp;#039;:&lt;br /&gt;
&lt;br /&gt;
 &amp;gt;Unknown_transcript01_rframe2_ORF&lt;br /&gt;
 MKKPLGKIVASTALLISVAFSSSIASAAEEAKEKYLIGFNEQEAVSEFVEQVEANDEVAI&lt;br /&gt;
 LSEEEEVEIELLHEFETIPVLSVELSPEDVDALELDPAISYIEEDAEVTTMAQSVPWGIS&lt;br /&gt;
 RVQAPAAHNRGLTGSGVKVAVLDTGISTHPDLNIRGGASFVPGEPSTQDGNGHGTHVAGT&lt;br /&gt;
 IAALNNSIGVLGVAPSAELYAVKVLGASGSGSVSSIAQGLEWAGNNGMHVANLSLGSPSP&lt;br /&gt;
 SATLEQAVNSATSRGVLVVAASGNSGAGSISYPARYANAMAVGATDQNNNRASFSQYGAG&lt;br /&gt;
 LDIVAPGVNVQSTYPGSTYASLNGTSMATPHVAGAAALVKQKNPSWSNVQIRNHLKNTAT&lt;br /&gt;
 SLGSTNLYGSGLVNAEAATR&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;Do we find any conserved protein domains?&amp;#039;&amp;#039;:&lt;br /&gt;
:Yes, there is a &amp;quot;Peptidase S8&amp;quot; domain. You can see it by clicking the &amp;lt;u&amp;gt;Graphic Summary&amp;lt;/u&amp;gt; tab.&lt;br /&gt;
&lt;br /&gt;
[[image:Peptidases_S8.png|center|frame|Conserved protein domains found by the NCBI Blast server]]&lt;br /&gt;
&amp;lt;!-- [[image:NCBI_BLAST_ProtDomains_Updated.JPG|center|frame|Conserved protein domains found by the NCBI Blast server]] --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;Do we find any significant hits? (E-value?)&amp;#039;&amp;#039;:&lt;br /&gt;
:Yes, a lot. The first many hits have an E-value of 0.0, and hit #100 is still very significant (3e-98) — note that by default, only the top 100 hits are shown!&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;Are all the best hits the same category of enzymes?&amp;#039;&amp;#039;:&lt;br /&gt;
:Yes, they are alkaline proteases (except a few that are hypothetical proteins).&lt;br /&gt;
:Note that you can click the Accession code for a hit and go directly to the corresponding entry in the database.&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;From what you have seen, what is best for identifying intermediate quality hits - DNA or Protein BLAST?&amp;#039;&amp;#039;:&lt;br /&gt;
:Protein BLAST (BLASTP). If you have very high quality hits, they can be identified by both methods, but if the evolutionary distance is larger, BLASTP is clearly better.&lt;br /&gt;
:Note: Recall from the PyMOL exercises that information between distant genes/proteins are conserved from: Structure &amp;gt; Peptide Sequence &amp;gt; Nucleotide sequence. So when the evolutionary distance is larger, blastp would generally give better hits than blastn.&lt;br /&gt;
&lt;br /&gt;
===QUESTION 3.3===&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;STEP 1 - cleaning up the sequence: &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;Subquestion: convert the sequence to FASTA format (manually, in JEdit) and quote it in your report.&amp;#039;&amp;#039; &lt;br /&gt;
&lt;br /&gt;
 &amp;gt;CLONE12&lt;br /&gt;
 AACGGGCACGGGACGCATGTAGCTGGAACAGTGGCAGCCGTAAATAATAATGGTATCGGA&lt;br /&gt;
 GTTGCCGGGGTTGCAGGAGGAAACGGCTCTACCAATAGTGGAGCAAGGTTAATGTCCACA&lt;br /&gt;
 CAAATTTTTAATAGTGATGGGGATTATACAAATAGCGAAACTCTTGTGTACAGAGCCATT&lt;br /&gt;
 GTTTATGGTGCAGATAACGGAGCTGTGATCTCGCAAAATAGCTGGGGTAGTCAGTCTCTG&lt;br /&gt;
 ACTATTAAGGAGTTGCAGAAAGCTGCGATCGACTATTTCATTGATTATGCAGGAATGGAC&lt;br /&gt;
 GAAACAGGAGAAATACAGACAGGCCCTATGAGGGGAGGTATATTTATAGCTGCCGCCGGA&lt;br /&gt;
 AACGATAACGTTTCCACTCCAAATATGCCTTCAGCTTATGAACGGGTTTTAGCTGTGGCC&lt;br /&gt;
 TCAATGGGACCAGATTTTACTAAGGCAAGCTATAGCACTTTTGGAACATGGACTGATATT&lt;br /&gt;
 ACTGCTCCTGGCGGAGATATTGACAAATTTGATTTGTCAGAATACGGAGTTCTCAGCACT&lt;br /&gt;
 TATGCCGATAATTATTATGCTTATGGAGAGGGAACATCCATGGCTTGTCCACATGTCGCC&lt;br /&gt;
 GGCGCCGCC&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;STEP 2 - thinking about the task: &amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;Subquestion: Give a summary of your considerations.&amp;#039;&amp;#039;&lt;br /&gt;
**&amp;#039;&amp;#039;Based on the information given: is the sequence protein-coding? &amp;#039;&amp;#039;&lt;br /&gt;
::Yes — we know this because the PCR primers used to clone the sequence target &amp;#039;&amp;#039;&amp;#039;known enzymes&amp;#039;&amp;#039;&amp;#039;. Therefore, it will make sense to try to translate the sequence using Virtual Ribosome.&lt;br /&gt;
:*&amp;#039;&amp;#039;If it is, can you trust it will contain both a START and STOP codon? &amp;#039;&amp;#039;&lt;br /&gt;
::No — the PCR primers used to clone the sequence target &amp;#039;&amp;#039;&amp;#039;the middle of the sequence&amp;#039;&amp;#039;&amp;#039;, in other words we must assume that our sequence is a fragment. Therefore, the ORF finder in Virtual Ribosome should be set to &amp;lt;u&amp;gt;Start codon: None&amp;lt;/u&amp;gt;.&lt;br /&gt;
:*&amp;#039;&amp;#039;Do we know if the sequence is sense or anti-sense? &amp;#039;&amp;#039;&lt;br /&gt;
::No — the PCR process amplifies a stretch of double-stranded DNA. Therefore, we should let Virtual Ribosome search in &amp;#039;&amp;#039;&amp;#039;all 6 reading frames&amp;#039;&amp;#039;&amp;#039;.&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;STEP 3 - Performing the database search&amp;#039;&amp;#039;&amp;#039;:&lt;br /&gt;
We want to use BLAST to search the large databases. Let&amp;#039;s therefore try the following:&lt;br /&gt;
# BLASTN &lt;br /&gt;
# Translate to protein (using Virtual Ribosome).&lt;br /&gt;
# BLASTP&lt;br /&gt;
Both when doing BLASTN and BLASTP we will use the NR database in order to search as broadly as possible. It would not make sense to use an organism-specific database when we don&amp;#039;t know which organism our sequence stems from.&lt;br /&gt;
&lt;br /&gt;
1) BLASTN. When trying BLASTN against NR we get some borderline significant results, but observe how small the query coverage percentages are (check also the &amp;lt;u&amp;gt;Graphic Summary&amp;lt;/u&amp;gt; tab!). &lt;br /&gt;
&lt;br /&gt;
[[image:NCBI BlastN_CLONE12 new_version.png]]&lt;br /&gt;
[[image:NCBI BlastN_CLONE12 Graphic Summary.png]]&lt;br /&gt;
&lt;br /&gt;
There is simply nothing in the entire NR database that has enough similarity to our whole query sequence. A search on the DNA level is only suited for finding very close hits.&lt;br /&gt;
&lt;br /&gt;
2) Translate using Virtual Ribosome with the settings we chose under Step 2 above.&lt;br /&gt;
&lt;br /&gt;
The result from the ORF finder:&lt;br /&gt;
 &lt;br /&gt;
 VIRTUAL RIBOSOME&lt;br /&gt;
 ----------------&lt;br /&gt;
 Translation table: Standard SGC0 &lt;br /&gt;
 &lt;br /&gt;
 &amp;gt;CLONE12_rframe1_ORF&lt;br /&gt;
 Reading frame: 1&lt;br /&gt;
 &lt;br /&gt;
     N  G  H  G  T  H  V  A  G  T  V  A  A  V  N  N  N  G  I  G  V  A  G  V  A  G  G  N  G  S  &lt;br /&gt;
 5&amp;#039; AACGGGCACGGGACGCATGTAGCTGGAACAGTGGCAGCCGTAAATAATAATGGTATCGGAGTTGCCGGGGTTGCAGGAGGAAACGGCTCT 90&lt;br /&gt;
    .......................................................................................... &lt;br /&gt;
 &lt;br /&gt;
     T  N  S  G  A  R  L  M  S  T  Q  I  F  N  S  D  G  D  Y  T  N  S  E  T  L  V  Y  R  A  I  &lt;br /&gt;
 5&amp;#039; ACCAATAGTGGAGCAAGGTTAATGTCCACACAAATTTTTAATAGTGATGGGGATTATACAAATAGCGAAACTCTTGTGTACAGAGCCATT 180&lt;br /&gt;
    .....................&amp;gt;&amp;gt;&amp;gt;.................................................................. &lt;br /&gt;
 &lt;br /&gt;
     V  Y  G  A  D  N  G  A  V  I  S  Q  N  S  W  G  S  Q  S  L  T  I  K  E  L  Q  K  A  A  I  &lt;br /&gt;
 5&amp;#039; GTTTATGGTGCAGATAACGGAGCTGTGATCTCGCAAAATAGCTGGGGTAGTCAGTCTCTGACTATTAAGGAGTTGCAGAAAGCTGCGATC 270&lt;br /&gt;
    .........................................................)))............)))............... &lt;br /&gt;
 &lt;br /&gt;
     D  Y  F  I  D  Y  A  G  M  D  E  T  G  E  I  Q  T  G  P  M  R  G  G  I  F  I  A  A  A  G  &lt;br /&gt;
 5&amp;#039; GACTATTTCATTGATTATGCAGGAATGGACGAAACAGGAGAAATACAGACAGGCCCTATGAGGGGAGGTATATTTATAGCTGCCGCCGGA 360&lt;br /&gt;
    ........................&amp;gt;&amp;gt;&amp;gt;..............................&amp;gt;&amp;gt;&amp;gt;.............................. &lt;br /&gt;
 &lt;br /&gt;
     N  D  N  V  S  T  P  N  M  P  S  A  Y  E  R  V  L  A  V  A  S  M  G  P  D  F  T  K  A  S  &lt;br /&gt;
 5&amp;#039; AACGATAACGTTTCCACTCCAAATATGCCTTCAGCTTATGAACGGGTTTTAGCTGTGGCCTCAATGGGACCAGATTTTACTAAGGCAAGC 450&lt;br /&gt;
    ........................&amp;gt;&amp;gt;&amp;gt;....................................&amp;gt;&amp;gt;&amp;gt;........................ &lt;br /&gt;
 &lt;br /&gt;
     Y  S  T  F  G  T  W  T  D  I  T  A  P  G  G  D  I  D  K  F  D  L  S  E  Y  G  V  L  S  T  &lt;br /&gt;
 5&amp;#039; TATAGCACTTTTGGAACATGGACTGATATTACTGCTCCTGGCGGAGATATTGACAAATTTGATTTGTCAGAATACGGAGTTCTCAGCACT 540&lt;br /&gt;
    ...............................................................)))........................ &lt;br /&gt;
 &lt;br /&gt;
     Y  A  D  N  Y  Y  A  Y  G  E  G  T  S  M  A  C  P  H  V  A  G  A  A  &lt;br /&gt;
 5&amp;#039; TATGCCGATAATTATTATGCTTATGGAGAGGGAACATCCATGGCTTGTCCACATGTCGCCGGCGCCGCC 609&lt;br /&gt;
    .......................................&amp;gt;&amp;gt;&amp;gt;........................... &lt;br /&gt;
&lt;br /&gt;
(&amp;#039;&amp;#039;&amp;#039;Tip:&amp;#039;&amp;#039;&amp;#039; Remember that you can get the sequence in FASTA format via the &amp;lt;u&amp;gt;FASTA&amp;lt;/u&amp;gt; link on the result page):&lt;br /&gt;
&lt;br /&gt;
 &amp;gt;CLONE12_rframe1_ORF&lt;br /&gt;
 NGHGTHVAGTVAAVNNNGIGVAGVAGGNGSTNSGARLMSTQIFNSDGDYTNSETLVYRAI&lt;br /&gt;
 VYGADNGAVISQNSWGSQSLTIKELQKAAIDYFIDYAGMDETGEIQTGPMRGGIFIAAAG&lt;br /&gt;
 NDNVSTPNMPSAYERVLAVASMGPDFTKASYSTFGTWTDITAPGGDIDKFDLSEYGVLST&lt;br /&gt;
 YADNYYAYGEGTSMACPHVAGAA&lt;br /&gt;
&lt;br /&gt;
3) BLASTP&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- The first thing we notice (already while the search is running) is that there is a &amp;quot;Peptidases_S8_S53&amp;quot; domain. This is a very strong indicator of the function. --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We get several very significant hits. When looking at the top hits and disregarding &amp;quot;hypothetical&amp;quot; and &amp;quot;uncharacterized&amp;quot; proteins, we can see that the rest are almost all serine proteases. Some of them are described as belonging to the of the S8 family.&lt;br /&gt;
&lt;br /&gt;
[[image:NCBI BlastP_CLONE12_rframe1_ORF new version.png]]&lt;br /&gt;
&lt;br /&gt;
Let&amp;#039;s take a closer look at the first hit that is not &amp;quot;uncharacterized&amp;quot;:&lt;br /&gt;
[[image:NCBI_BlastP_CLONE12_best_hit.png]]&lt;br /&gt;
&lt;br /&gt;
Note that although it is not a perfect hit (our query sequence not existing in the database) it looks reasonable: the alignment covers a large part of the query with Identity of 54% and Similarity (Positives) of 69%.&lt;br /&gt;
&lt;br /&gt;
Taken together with the fact that almost all the best non-hypothetical hits are serine proteases, we have a very strong indication that our mystery sequence, CLONE12, is a peptidase or protease of the S8 family.&lt;br /&gt;
&lt;br /&gt;
==Part 4: BLAST&amp;#039;ing Genomes==&lt;br /&gt;
&lt;br /&gt;
===QUESTION 4.1===&lt;br /&gt;
&amp;#039;&amp;#039;What information is given about the relationship between this gene and the gene &amp;quot;HTA1&amp;quot;?&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
They are nearly identical (&amp;quot;one of two nearly identical (see also HTA1) subtypes&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
Protein sequence: &lt;br /&gt;
 &amp;gt;YBL003C  &lt;br /&gt;
 MSGGKGGKAGSAAKASQSRSAKAGLTFPVGRVHRLLRRGNYAQRIGSGAPVYLTAVLEYL&lt;br /&gt;
 AAEILELAGNAARDNKKTRIIPRHLQLAIRNDDELNKLLGNVTIAQGGVLPNIHQNLLPK&lt;br /&gt;
 KSAKTAKASQEL*&lt;br /&gt;
&lt;br /&gt;
===QUESTION 4.2===&lt;br /&gt;
*&amp;#039;&amp;#039;How many high-confidence hits do we get?&amp;#039;&amp;#039;:&lt;br /&gt;
:3 — HTA1, HTA2 and HTZ1.&lt;br /&gt;
:Note: If you click on the &amp;lt;u&amp;gt;Gene&amp;lt;/u&amp;gt; links for the two top hits, you will see that one is HTA1 and the other is HTA2.&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;Do the hits make sense, from what you have read about HTA2 at the SGD webpage?&amp;#039;&amp;#039;:&lt;br /&gt;
:Yes; HTA1 and HTA2 are indeed nearly identical (only 2 amino acids differ).&lt;br /&gt;
&lt;br /&gt;
===QUESTION 4.3===&lt;br /&gt;
&lt;br /&gt;
*&amp;#039;&amp;#039;How many high-confidence hits (with E-value better than 10&amp;lt;sup&amp;gt;-10&amp;lt;/sup&amp;gt;) are found?&amp;#039;&amp;#039;&lt;br /&gt;
:Answer: 29.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Lad os ligesom før kræve at hits skal have en e-value på 1e-10 eller bedre (mindre) for at vi regner det for pålideligt. Dette passer også fint med, at disse hits er enige om at der er tale om en histon.&lt;br /&gt;
&lt;br /&gt;
Som udgangspunkt er der derfor 26 gode hits. Man kan dog godt argumentere for at man skal passe på hits der kun har en forudsagt funktion (&amp;quot;PREDICTED&amp;quot;) - dem har vi en enkelt af. Som det også ses er der en række af hits&amp;#039;ne der dækker over variationer af samme protein (fx alle dem der hedder: &amp;lt;tt&amp;gt;histone H2A type-XYZ&amp;lt;/tt&amp;gt;). Det er ikke altid at alle detaljerne kommer med i den korte form af overskrifter - nogen gange kan det være nødvendigt mauelt at inspicere et hit (klikke på link&amp;#039;et og læse hvad der står af information i det bagvedliggende database entry).&lt;br /&gt;
&lt;br /&gt;
Lad os arbejde videre med alle 26 gode hits i de næste analyser.&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;These protein originates from a number of genes - but how many UNIQUE genes?&amp;#039;&amp;#039;:&lt;br /&gt;
&lt;br /&gt;
Svar: Trick&amp;#039;et er her af en del af hits&amp;#039;ne er fra isoformer af det sammen protein. Et protein med to isofomer stammer stadig kun fra en enkelt gen&lt;br /&gt;
&lt;br /&gt;
Lad os fx kigge på følgende hits:&lt;br /&gt;
&lt;br /&gt;
 ref|NP_613075.1|  core histone macro-H2A.1 isoform 1 [Homo sap...   152    7e-38&lt;br /&gt;
 ref|NP_004884.1|  core histone macro-H2A.1 isoform 2 [Homo sap...   152    8e-38&lt;br /&gt;
&lt;br /&gt;
Hvis man gå ind og læser sekvens-entry&amp;#039;et for disse hits kan man faktisk direkte se, at de stammer fra sammen gen (/gene=&amp;quot;H2AFY&amp;quot;, /gene=&amp;quot;H2AFY&amp;quot;). BEMÆRK: Her er vi heldige og alt den information vi har burg for står faktisk i overskrifterne.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
En hurtig optælling afslører følgende isoformet i vores set af BLAST hits:&lt;br /&gt;
&lt;br /&gt;
 ref|NP_613075.1|  core histone macro-H2A.1 isoform 1 [Homo sap...   152    7e-38&lt;br /&gt;
 ref|NP_004884.1|  core histone macro-H2A.1 isoform 2 [Homo sap...   152    8e-38&lt;br /&gt;
 ref|NP_613258.2|  core histone macro-H2A.1 isoform 3 [Homo sap...   152    8e-38&lt;br /&gt;
&lt;br /&gt;
 ref|NP_036544.1|  histone H2A.V isoform 1 [Homo sapiens]            133    4e-32&lt;br /&gt;
 ref|NP_619541.1|  histone H2A.V isoform 2 [Homo sapiens]            114    2e-26&lt;br /&gt;
 ref|NP_958844.1|  histone H2A.V isoform 3 [Homo sapiens]            112    8e-26&lt;br /&gt;
 ref|NP_958925.1|  histone H2A.V isoform 5 [Homo sapiens]           75.9    8e-15&lt;br /&gt;
 ref|NP_958924.1|  histone H2A.V isoform 4 [Homo sapiens]           62.8    7e-11&lt;br /&gt;
&lt;br /&gt;
Dvs at vi i alt har 27 - 2 - 4 = 21 unikke hits til gener.&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
===QUESTION 4.4===&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Langt svar&amp;#039;&amp;#039;&amp;#039;: Et lavt komplekst område er et stykke sekvens der ikke indeholder særligt meget information (fx. TTTTAAAA i human – findes milioner af gange i genomet).&lt;br /&gt;
&lt;br /&gt;
BLAST har et indbygget filter der masker disse områder ud i søgningen.&lt;br /&gt;
&lt;br /&gt;
Ud over at kunne slå det til og fra – har NCBI valgt et tredie mulighed (som default) – slå det fra men at vise hvor områderne er (små bogstaver):&lt;br /&gt;
&lt;br /&gt;
 &amp;gt;ref|NP_003503.1|  histone H2A type 1-C [Homo sapiens]&lt;br /&gt;
 Length=130&lt;br /&gt;
 &lt;br /&gt;
  GENE ID: 8334 HIST1H2AC | histone cluster 1, H2ac [Homo sapiens]&lt;br /&gt;
 (Over 10 PubMed links)&lt;br /&gt;
 &lt;br /&gt;
  Score =  194 bits (494),  Expect = 1e-50, Method: Compositional matrix adjust.&lt;br /&gt;
  Identities = 96/125 (76%), Positives = 108/125 (86%), Gaps = 0/125 (0%)&lt;br /&gt;
 &lt;br /&gt;
 Query  4    gkggkagsaakasqsrsakagLTFPVGRVHRLLRRGNYAQRIGSGAPVYLTavleylaae  63&lt;br /&gt;
             G+G + G A   ++SRS++AGL FPVGRVHRLLR+GNYA+R+G+GAPVYL AVLEYL AE&lt;br /&gt;
 Sbjct  3    GRGKQGGKARAKAKSRSSRAGLQFPVGRVHRLLRKGNYAERVGAGAPVYLAAVLEYLTAE  62&lt;br /&gt;
 &lt;br /&gt;
 Query  64   ilelaGNAARDNKKTRIIPRHLQLAIRNDDELNKLLGNVTIAQGGVLPNIHQNLLPKKSA  123&lt;br /&gt;
             ILELAGNAARDNKKTRIIPRHLQLAIRND+ELNKLLG VTIAQGGVLPNI   LLPKK+ &lt;br /&gt;
 Sbjct  63   ILELAGNAARDNKKTRIIPRHLQLAIRNDEELNKLLGRVTIAQGGVLPNIQAVLLPKKTE  122&lt;br /&gt;
 &lt;br /&gt;
 Query  124  KTAKA  128&lt;br /&gt;
                KA&lt;br /&gt;
 Sbjct  123  SHHKA  127&lt;br /&gt;
&lt;br /&gt;
Hvis man gentager søgningen med filteret slået til giver det følgende (jeg har også valgt at viser X’er i det filtrerede område – hvilket NORMALT er standard i BLAST):&lt;br /&gt;
&lt;br /&gt;
 &amp;gt;ref|NP_003503.1|  histone H2A type 1-C [Homo sapiens]&lt;br /&gt;
 Length=130&lt;br /&gt;
 &lt;br /&gt;
  GENE ID: 8334 HIST1H2AC | histone cluster 1, H2ac [Homo sapiens]&lt;br /&gt;
 (Over 10 PubMed links)&lt;br /&gt;
 &lt;br /&gt;
  Score =  147 bits (372),  Expect = 2e-36, Method: Compositional matrix adjust.&lt;br /&gt;
  Identities = 87/104 (83%), Positives = 93/104 (89%), Gaps = 0/104 (0%)&lt;br /&gt;
 &lt;br /&gt;
 Query  25   LTFPVGRVHRLLRRGNYAQRIGSGAPVYLTXXXXXXXXXXXXXXGNAARDNKKTRIIPRH  84&lt;br /&gt;
             L FPVGRVHRLLR+GNYA+R+G+GAPVYL AVLEYL AEILELAGNAARDNKKTRIIPRH&lt;br /&gt;
 Sbjct  24   LQFPVGRVHRLLRKGNYAERVGAGAPVYLAAVLEYLTAEILELAGNAARDNKKTRIIPRH  83&lt;br /&gt;
 &lt;br /&gt;
 Query  85   LQLAIRNDDELNKLLGNVTIAQGGVLPNIHQNLLPKKSAKTAKA  128&lt;br /&gt;
             LQLAIRND+ELNKLLG VTIAQGGVLPNI   LLPKK+    KA&lt;br /&gt;
 Sbjct  84   LQLAIRNDEELNKLLGRVTIAQGGVLPNIQAVLLPKKTESHHKA  127&lt;br /&gt;
&lt;br /&gt;
Bemærk det kortere alignment og den ændrede E-værdi. &lt;br /&gt;
&lt;br /&gt;
Hvis man eksplicit slår filteret fra, ser resultatet således ud:&lt;br /&gt;
&lt;br /&gt;
 &amp;gt;ref|NP_003503.1|  histone H2A type 1-C [Homo sapiens]&lt;br /&gt;
 Length=130&lt;br /&gt;
 &lt;br /&gt;
  GENE ID: 8334 HIST1H2AC | histone cluster 1, H2ac [Homo sapiens]&lt;br /&gt;
 (Over 10 PubMed links)&lt;br /&gt;
 &lt;br /&gt;
  Score =  194 bits (494),  Expect = 1e-50, Method: Compositional matrix adjust.&lt;br /&gt;
  Identities = 96/125 (76%), Positives = 108/125 (86%), Gaps = 0/125 (0%)&lt;br /&gt;
 &lt;br /&gt;
 Query  4    GKGGKAGSAAKASQSRSAKAGLTFPVGRVHRLLRRGNYAQRIGSGAPVYLTAVLEYLAAE  63&lt;br /&gt;
             G+G + G A   ++SRS++AGL FPVGRVHRLLR+GNYA+R+G+GAPVYL AVLEYL AE&lt;br /&gt;
 Sbjct  3    GRGKQGGKARAKAKSRSSRAGLQFPVGRVHRLLRKGNYAERVGAGAPVYLAAVLEYLTAE  62&lt;br /&gt;
 &lt;br /&gt;
 Query  64   ILELAGNAARDNKKTRIIPRHLQLAIRNDDELNKLLGNVTIAQGGVLPNIHQNLLPKKSA  123&lt;br /&gt;
             ILELAGNAARDNKKTRIIPRHLQLAIRND+ELNKLLG VTIAQGGVLPNI   LLPKK+ &lt;br /&gt;
 Sbjct  63   ILELAGNAARDNKKTRIIPRHLQLAIRNDEELNKLLGRVTIAQGGVLPNIQAVLLPKKTE  122&lt;br /&gt;
 &lt;br /&gt;
 Query  124  KTAKA  128&lt;br /&gt;
                KA&lt;br /&gt;
 Sbjct  123  SHHKA  127&lt;br /&gt;
&lt;br /&gt;
Præcis samme længde, e-værdi mm. som i det første alignment.&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Kort svar&amp;#039;&amp;#039;&amp;#039;: Ja, alignmentet bliver kortere når low-complexity filteret er slået til.&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>WikiSysop</name></author>
	</entry>
</feed>