Written by: Rasmus Wernersson, April 2015 (latest update: April 2018)

Q1

The "GT" consensus site is clearly seen (completely conserved: 2 bits of information), and it appears that there is also some signal on the exon side (preference for "G" on the position before) and on the intron side.
The intron starts at position 11 - this means that position 1-10: EXON and 11-20: INTRON.

Q2 - pretty LOGO

Q3 - frequency LOGO

Q4 - cross species comparison

IMPORTANT: This question is the easiest to answer, if you compare the DONOR and ACCEPTOR site separately across the 5 species (it's simply easier to spot the differences this way)

Donor sites

Observations:

The animal (human + fruit fly) DONOR sites contain ~1 bit of information in the very last position of the exon (with a preference for "G"), and some information (< 1 bit) for the 4 positions following "GT" in the intron (=> in 6 intron positions in all)
The plant (Arabidopsis) has the same pattern of some signal in the final 2 exon positions, but the signal in the intron is very weak after the "GT".
The two fungal species show a pattern of next to no signal in the exon, but a very strong signal in the intron side beyond the GT.

Acceptor sites

Overall observation: the ACCEPTOR site motif is much more alike across all 5 species compared to the DONOR sites.
- In all cases there is next to no signal on the EXON side (after the "AG") - and there is a strong preference for T and C (~1 bit, as strong as it can get for a two-letter preference) immediately before AG.
- There is a diffuse preference for Ts in the region before AG in the animals + the plant.
- This preference for Ts is clearly centered around the -9 position in the fungi.

Q5 - E. coli - Shine-Dalgano

The START codon is mostly ATG, but GTG is common enough to be seen at the first position. If you zoom in at position 51-53 it is possible to a small number of other bases also being used in rare cases.
A region with As and Gs can been seen a position 40-44 which could potentially be part of the SD sequence.

Q6 SD zoom

The LOGO is consistent with the consensus sequence AGGAGG in the way, that it's not a huge disagreement with it. From the LOGO it appears that it's a bit of a stretch to claim A over G in any of the positions, but a clear overrepresentation of As OR Gs is clearly seen.

Q7 - Kozak sequence (Yeast)

There appears to be a weak signal in the positions immediately before the START codon (especially the -3 position = coordinate 48).

Zoom + Y axis rescale of the 40-50 region

It can now clearly be seen that only position 48 (= 3 before the ATG) has information content above 0.2 bits.

By plotting a frequency plot of the same region, it can be seen that >50% of the sequences have A in position 48.

Q8 - Signal peptides comparison

Similarities:
- It's clearly seen that position -1 (just before the cleavage) and -3 is important and A (alanine) is preferred here (especially in the prokaryotes).
- In all three cases there is a stretch of hydrophobic (color = black) amino acids (L, V, A, I) in the middle of the signal peptide.
Differences:
- The preference for A (alanine) at the -1 position is much stronger in the prokaryotic sequences
- The hydrophobic stretch is longer in Gram positive bacteria
- There is a preference for S/A at position -6 in Gram negatives that is not seen elsewhere
- There is no signal after the cleavage site in eukaryotes and some signal in the first few positions in both prokaryotic groups

Q9 - seq2logo

Yes - it clearly shows the same overall motif as above. Note that, unlike WebLogo, Seq2logo indicates positions with gaps by making the stack of letters more narrow.

Q10 - small data sets

IMPORTANT: Compare the LOGOs from the small data set to the LOGO we got from the large data set (Question 9+10) and investigate if you can see the same pattern.

The first plot (without pseudo-counts) is very noisy, and only the very overall trends can be seen: the tendency to have an "A" at the -1 position and a somewhat diffuse hydrophobic region.
In the second plot (with the pseudo-counts) the picture looks a lot more like what we saw in the big data sets: a specific pattern at the -1 and -3 positions and the hydrophobic region much more in the shape with what we saw before.

ExSeqLogosAnswers

Contents

Q1

Q2 - pretty LOGO

Q3 - frequency LOGO

Q4 - cross species comparison

Donor sites

Acceptor sites

Q5 - E. coli - Shine-Dalgano

Q6 SD zoom

Q7 - Kozak sequence (Yeast)

Q8 - Signal peptides comparison

Q9 - seq2logo

Q10 - small data sets

Navigation menu

ExSeqLogosAnswers

Q1

Q2 - pretty LOGO

Q3 - frequency LOGO

Q4 - cross species comparison

Donor sites

Acceptor sites

Q5 - E. coli - Shine-Dalgano

Q6 SD zoom

Q7 - Kozak sequence (Yeast)

Q8 - Signal peptides comparison

Q9 - seq2logo

Q10 - small data sets

Navigation menu

Search