Exercise: Translation - Virtual Ribosome
Exercise written by: Rasmus Wernersson
In this exercise we will be using Virtual Ribosome - a software that provides a series of functions to translate DNA to protein sequences. Besides using the simple functions to translate DNA using a known reading frame, we shall work on computer-based analysis of possible reading frames, location of START and STOP codons etc.
Step 1: Basic translation
- Open Virtual Ribosome (in a new window): https://services.healthtech.dtu.dk/services/VirtualRibosome-2.0/ Spend a few minutes to get familiar with the website - where do you upload the input data, and what types of options are available.
- If you only have one sequence, this can be directly pasted into the input window. Alternatively, Virtual Ribosome can handle a series of different input formats that allow for multiple sequence inputs (i.e. FASTA).
- Lets first do a simple example, and make a translation of a known gene Actin (from Yeast). Copy the sequence below into the sequence field and press "submit query", using default settings.
>Yeast_ACT1 ATGGATTCTGAGGTTGCTGCTTTGGTTATTGATAACGGTTCTGGTATGTGTAAAGCCGGT TTTGCCGGTGACGACGCTCCTCGTGCTGTCTTCCCATCTATCGTCGGTAGACCAAGACAC CAAGGTATCATGGTCGGTATGGGTCAAAAAGACTCCTACGTTGGTGATGAAGCTCAATCC AAGAGAGGTATCTTGACTTTACGTTACCCAATTGAACACGGTATTGTCACCAACTGGGAC GATATGGAAAAGATCTGGCATCATACCTTCTACAACGAATTGAGAGTTGCCCCAGAAGAA CACCCTGTTCTTTTGACTGAAGCTCCAATGAACCCTAAATCAAACAGAGAAAAGATGACT CAAATTATGTTTGAAACTTTCAACGTTCCAGCCTTCTACGTTTCCATCCAAGCCGTTTTG TCCTTGTACTCTTCCGGTAGAACTACTGGTATTGTTTTGGATTCCGGTGATGGTGTTACT CACGTCGTTCCAATTTACGCTGGTTTCTCTCTACCTCACGCCATTTTGAGAATCGATTTG GCCGGTAGAGATTTGACTGACTACTTGATGAAGATCTTGAGTGAACGTGGTTACTCTTTC TCCACCACTGCTGAAAGAGAAATTGTCCGTGACATCAAGGAAAAACTATGTTACGTCGCC TTGGACTTCGAACAAGAAATGCAAACCGCTGCTCAATCTTCTTCAATTGAAAAATCCTAC GAACTTCCAGATGGTCAAGTCATCACTATTGGTAACGAAAGATTCAGAGCCCCAGAAGCT TTGTTCCATCCTTCTGTTTTGGGTTTGGAATCTGCCGGTATTGACCAAACTACTTACAAC TCCATCATGAAGTGTGATGTCGATGTCCGTAAGGAATTATACGGTAACATCGTTATGTCC GGTGGTACCACCATGTTCCCAGGTATTGCCGAAAGAATGCAAAAGGAAATCACCGCTTTG GCTCCATCTTCCATGAAGGTCAAGATCATTGCTCCTCCAGAAAGAAAGTACTCCGTCTGG ATTGGTGGTTCTATCTTGGCTTCTTTGACTACCTTCCAACAAATGTGGATCTCAAAACAA GAATACGACGAAAGTGGTCCATCTATCGTTCACCACAAGTGTTTCTAA
- Look at the result. Note that the output shows both the DNA, and protein sequences as well as information on START and STOP codons. You can click on "instructions" on both the main page and the results page for details on what is displayed. Note also that the "raw" protein sequence can be downloaded in FASTA format.
- QUESTION 1:
- How is a STOP codon displayed?
- How is a START codon displayed?
- Does a start-codon always code for Methionine (M)?
- What is the difference between the two types of start codons?
Step 2: Genetic codes
- We are now going to work with yet another gene from yeast. This time it is COX1 that codes for Cytochrome C OXidase, subunit 1 (for more information click here: COX1 - Saccharomyces Genome Database). Note that this is a mitochondrial gene. Translate this gene using default settings.
>Yeast_COX1 ATGGTACAAAGATGATTATATTCAACAAATGCAAAAGATATTGCAGTATTATATTTTATG TTAGCTATTTTTAGTGGTATGGCAGGAACAGCAATGTCTTTAATCATTAGATTAGAATTA GCTGCACCTGGTTCACAATATTTACATGGTAATTCACAATTATTTAATGTTTTAGTAGTT GGTCATGCTGTATTAATGATTTTCTTCTTAGTAATGCCTGCTTTAATTGGAGGTTTTGGT AACTATTTATTACCATTAATAATTGGAGCTACAGATACAGCATTTCCAAGAATTAATAAC ATTGCTTTTTGAGTATTACCTATGGGGTTAGTATGTTTAGTTACATCAACTTTAGTAGAA TCAGGTGCTGGTACAGGGTGAACTGTCTATCCACCATTATCATCTATTCAGGCACATTCA GGACCTAGTGTAGATTTAGCAATTTTTGCATTACATTTAACATCAATTTCATCATTATTA GGTGCTATTAATTTCATTGTAACAACATTAAATATGAGAACAAATGGTATGACAATGCAT AAATTACCATTATTTGTATGATCAATTTTCATTACAGCGTTCTTATTATTATTATCATTA CCTGTATTATCTGCTGGTATTACAATGTTATTATTAGATAGAAACTTCAATACTTCATTC TTTGAAGTATCAGGAGGTGGTGACCCAATCTTATACGAGCATTTATTTTGATTCTTTGGT CACCCTGAAGTATATATTTTAATTATTCCTGGATTTGGTATTATTTCACATGTAGTATCA ACATATTCTAAAAAACCTGTATTTGGTGAAATTTCAATGGTATATGCTATGGCTTCAATT GGATTATTAGGATTCTTAGTATGATCACATCATATGTATATTGTAGGATTAGATGCAGAT CTTAGAGCATATTTCCTATCTGCACTAATGATTATTGCAATTCCAACAGGAATTAAAATT TTCTCATGATTAGCTCTAATCCATGGTGGTTCAATTAGATTAGCACTACCTATGTTATAT GCAATTGCATTCTTATTCTTATTCACAATGGGTGGTTTAACTGGTGTTGCCTTAGCTAAC GCCTCATTAGATGTAGCATTCCACGATACTTACTACGTGGTGGGACATTTTCACTATGTA TTATCAATGGGTGCTATTTTCTCTTTATTTGCAGGATACTATTATTGAAGTCCTCAAATT TTAGGTTTAAACTATAATGAAAAATTAGCTCAAATTCAATTCTGATTAATTTTCATTGGG GCTAATGTTATTTTCTTCCCAATGCATTTTTTAGGTATTAATGGTATGCCTAGAAGAATT CCTGATTATCCTGATGCTTTCGCAGGATGAAATTATGTCGCTTCTATTGGTTCATTCATT GCACTATTATCATTATTCTTATTTATCTATATTTTATATGATCAATTAGTTAATGGATTA AACAATAAAGTTAATAATAAATCAGTTATTTATAATAAAGCACCTGATTTTGTAGAATCT AATCTTATCTTTAATTTAAATACAGTTAAATCTTCATCTATCGAATTCTTATTAACTTCT CCACCAGCTGTACACTCATTTAATACACCAGCTGTACAATCTTAA
- QUESTION 2:
- Did the translation succeed (i.e. did it yield a long amino acid sequence unbroken by stop codons)?
- Nothing is wrong with the DNA sequence. Can you come up with some good reasons for the result?
- Keep the result of the translation in a window (we need it again in a while), and open a new window with Virtual Ribosome. Translate the DNA sequence once more using a different translation table (see options). Guess yourself which table to select.
- If you have chosen the right translation table, the DNA sequence can be translated without any problems. Compare the two results and answer the following questions:
- QUESTION 3
- What is the difference in the use of STOP codons?
- What is the difference in the use of START codons?
- Are codons coding for completely different amino acids?
More information on the definition of the different translation tables is found here: [The Genetic Codes - NCBI]. The tables are shown in a "compressed" format, but can be shown in a more comprehensible format by using the "Click here to change format" option. Note: The use of START codons is described in details for all genetic codes. The difference between the standard-code and other codes is summarized in each section.
Step 3: Reading frames
Remember to reset all options (in particular make sure that you now use the standard genetic code) before continuing the exercise.
We have up to now assumed that the reading frame for the DNA-sequence was known and that it always started at the first nucleotide. In the following, we shall examine how it is often possible to identify the most likely reading frame using computational translation tools. We shall use the the sequence below which is the complete mRNA sequence for a yeast gene (profilin). Use your biological knowledge to answer the following questions:
- QUESTION 4:
- Yeast has introns in some genes, could this be a major problem in this case?
- Can an mRNA molecule contain more sequence than the gene in question? (Can it be longer than the CDS coding for the protein).
>gi|4226|emb|Y00469.1| Yeast mRNA for profilin GGCAAATTATGTCTTGGCAAGCATACACTGATAACTTAATAGGAACCGGTAAAGTCGACAAAGCTGTCAT CTACTCGAGAGCAGGTGACGCTGTTTGGGCTACTTCTGGTGGCCTATCTTTGCAACCAAACGAAATTGGT GAAATTGTTCAAGGCTTCGACAATCCAGCTGGTTTGCAAAGCAATGGTTTGCATATTCAAGGCCAAAAGT TCATGTTGTTGAGAGCTGACGATAGAAGTATCTACGGTAGACATGATGCTGAGGGTGTTGTTTGTGTAAG AACTAAGCAAACCGTTATTATTGCTCATTATCCACCAACCGTACAAGCCGGTGAGGCCACCAAGATTGTC GAGCAATTGGCTGACTACTTGATTGGTGTTCAATACTAATTTATGCAGGTAAAGTTTTCTTGCCTTATAC ACCACCTATTCTGGCATCTGCGGGATTTCGCTTCCTATTTTACAAATATTTTATTGATTGACGCTAATTA TCACTGTAAAAGGCGCACTTTTTATATGTAGTCACATCCGGTATTTAACATATTTACGAAACAGTCTTAA GAATATCGACATTTGATATACTTATGTTTAATTTATCTACATATTACAATCA
Six reading frames exist: 1, 2, 3 (on the positive stand, i.e. the sequence as you read it), and -1, -2, -3 (on the negative strand, i.e the complementary DNA string). Since we are working with a mRNA sequence, we do not need to consider the reading frames on the complementary string.
- QUESTION 5:
- Why is this?
Translate the mRNA sequence in the three positive reading frames (1, 2, 3). The easiest way to do this, is to use a window/tab for each translation to be able to compare the different results.
- QUESTION 6:
- What reading frame is most likely the right one?
NB: remember that START and STOP codons are only shown for the selected reading frame. Note also that the DNA-sequence is shown unmodified in all three reading frames whereas the protein sequence is shifted.
It is possible to show multiple reading frames simultaneously. Use the Plus (1,2,3) as reading frame, and translate the sequence again.
Note that the amino acid letter is centered above each codon (i.e. "M" is placed over the "T" in "ATG"). The translation from reading frame 1 is shown just above the DNA sequence, followed by reading frame 2, and 3. START and STOP codons for all three reading frames are shown at once
For the sake of illustration, we shall try to translate the sequence on the negative strand. Select reading frame -1, and redo the translation.
- QUESTION 7:
- How does the DNA sequence in the output look (is it identical to the one you input)?
- In what direction shall it be read (left-to-right or right-to-left)?
- In what direction shall the protein-sequence be read (left-to-right or right-to-left)?
- (Try to compare to the protein sequence in FASTA format).
Now, lets try to do it all in one go. Select All (6 reading frames) and translate the sequence again.
- QUESTION 8:
- How many DNA strings are displayed?
- Why is this?
Note the large number of possibilities a single DNA sequence contains with respect to translation to protein sequence.
Step 4: ORF finder
We have now made a manual screening for possible reading frames. Such a procedure might work fine if you have only one DNA sequence, but this is in general not the case, and often you need to use computer-based ORF finders. An ORF (Open Reading Frame) is a DNA sequence that is not interrupted by a STOP codon. Often one will be looking for the longest ORF starting with a START codon and ending at a STOP codon.
The longest ORF is found by translating the sequence in all six reading frames, and then selecting the longest protein sequence.
We shall now use a build-in ORF finder with the most stringent criteria. Under in the ORF finder section use the following settings:
- Start codon: strict (this forces the ORF to start at ATG)
- Select "All (6 reading frames)"
Finally translate the sequence using these settings.
- QUESTION 9:
- Does the result fit to what you found earlier?
- Would it make any difference to the result if we had only a partial sequence where the last part of the sequence with the STOP codon is missing?
- What would happen if the first 50 nucleotides (with the START codon) were missing?