Link collection

From 22111
Jump to navigation Jump to search

Taxonomy

Tree of Life http://www.tolweb.org/
(Good descriptive Taxonomy database — limited range of organisms).
NCBI Taxonomy http://www.ncbi.nlm.nih.gov/Taxonomy/
(Somewhat "technical" but very exhaustive taxonomical database. TaxIDs are also used in GenBank and UniProt).
The "Common Tree" function can be used to investigate how closely related two or more organisms are: http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi
NCBI search with "Token set" can be used if you do not know the Latin name: http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi

DNA Databases

GenBank
Search page: https://www.ncbi.nlm.nih.gov/nucleotide
SGD (Saccharomyces Genome Database) http://www.yeastgenome.org
(The Baker's yeast genome)
Gene https://www.ncbi.nlm.nih.gov/gene/
Database of genes in completely sequenced genomes and their phenotypes.

Translation

Virtual Ribosome
https://services.healthtech.dtu.dk/services/VirtualRibosome-2.0/
"The Genetic Codes" (NCBI) https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c
Information about translation codes

Protein databases

Protein sequence and annotations

UniProt https://www.uniprot.org

Protein 3D structure

PDB (Protein Data Bank) http://www.rcsb.org/

Protein domains

InterPro https://www.ebi.ac.uk/interpro/

Alignment

Pairwise alignment

Pairwise alignment (global and local) http://www.ebi.ac.uk/emboss/
Use "Needle" for global alignment and "Water" for local alignment.
Shuffle a sequence in random order (to get a null model)
Protein: http://www.bioinformatics.org/sms2/shuffle_protein.html
DNA: http://www.bioinformatics.org/sms2/shuffle_dna.html

Multiple alignment

The multiple alignment programs MUSCLE and Clustal Omega are built into Seaview, which should be installed on your computer.

Other multiple alignment methods on EBI's server
RevTrans
Special method for aligning coding DNA. https://services.healthtech.dtu.dk/services/RevTrans-2.0/

Phylogenetic trees

Seaview can draw simple trees, but if you need more options and annotations, go to:

interactive Tree Of Life (iTOL)
https://itol.embl.de/

BLAST

Note: Most sequence databases, including UniProt and RCSB PDB, offer an option for doing BLAST searches. In the course we have used NCBI's BLAST, since NCBI has the largest selection of databases and is the home of GenBank.

NCBI BLAST
https://blast.ncbi.nlm.nih.gov/Blast.cgi
  • BLASTN: Choose "nucleotide blast" and "blastn" on the next page.
NB: We do not use "megablast" in this course (it is constructed for finding sequences that are very similar).
  • BLASTP: Choose "protein blast" and "blastp" on the next page.
Note the information about conserved protein domains near the top of the results page. Click the domain to see further information.

Remember for BLASTN and BLASTP to choose a relevant database (use NR/NT to get the grand overview; but use PDB for structures, or specify an organism or taxonomic group under Organism if it makes sense for your task).

PSI-BLAST
Go to NCBI BLAST (see above) and choose "Protein blast" — on the next page you can then choose PSI-BLAST.

Weight matrices and sequence logos

WebLogo http://weblogo.berkeley.edu/
A good general-purpose logo generator for BOTH DNA and peptide sequences.
Alternate link to version 3 (lacks some options): http://weblogo.threeplusone.com/
Seq2Logo
A more advanced method for working with peptide sequences. https://services.healthtech.dtu.dk/services/Seq2Logo-2.0/
EasyPred
Make a logo AND train a weight matrix using clustering and pseudocounts. https://services.healthtech.dtu.dk/services/EasyPred-1.0/