ExUniProt-answers

From 22111
Jump to navigation Jump to search


The numbers are found using UniProt on Sep 17, 2024

Simple text mining

QUESTION 1.1:

  1. How many hits do you find?
    8,061
  2. How many of these hits are from Swiss-Prot?
    1,704
  3. Can you identify the correct hit (i.e. see which one is actually human insulin and not something else)?
    It's P01308 / INS_HUMAN (not necessarily the top hit, but still on the first page).

QUESTION 1.2: How many hits are now left? How many of these are from Swiss-Prot?
1,769 and 1,126

QUESTION 1.3: How many hits are now left? How many of these are from Swiss-Prot?
201 and 60, search string: (organism_id:9606) AND (protein_name:insulin)

QUESTION 1.4: How many hits are now left? How many of these are from Swiss-Prot?
102 and 25, search string: (organism_id:9606) AND (protein_name:insulin) NOT (protein_name:insulin-like)

QUESTION 1.5:

  1. How did you do this?
    by adding NOT (protein_name:receptor) to the query box.
  2. How many hits are now left? How many of these are from Swiss-Prot?
    52 and 16

The contents of UniProt

QUESTION 2.1:

  1. How many references are there in the insulin entry?
    36
  2. Why do you think insulin is such a highly investigated protein?
    Because it is linked to a common and serious disease (diabetes) and used as a drug.

QUESTION 2.2:

  1. Where do you find insulin?
    It is secreted from the cell (this is written just below the section heading. Under GO - Cellular component you can find additional locations mentioned, such as endoplasmic reticulum lumen, but these are temporary stages on the way to secretion).
  2. Why do you think is it found there?
    Because it is a hormone - it has to travel through the bloodstream to influence other cells.

QUESTION 2.3: How long is the signal peptide and the propeptide, respectively?
24 and 31 amino acids.

QUESTION 2.4: Which positions are in β-sheet conformation in insulin?
Positions 26-29, 48-50, 56-58, 74-76, and 98-101.

Other databases linked from UniProt

No questions asked here.

Text format

No questions asked here.

Advanced search

QUESTION 3.1: How many proteins did you find, and what was the search string (the text in the search field)?
17,883,532, of these 44,569 from Swiss-Prot
(ft_signal:*)

QUESTION 3.2: How many proteins do you find now, and what has the search string changed into?
3,854, they are all from Swiss-Prot
(ft_signal_exp:*)
Note that the "experimental" evidence is only found in Swiss-Prot entries, not in TrEMBL!

QUESTION 3.3: How many proteins do you find now, and what is the search string?
734
(ft_signal_exp:*) AND (organism_id:9606)

QUESTION 3.4: How many proteins are there in UniProt from Bacillus subtilis with the default TaxID [1423]?
18,684 results, of these only 62 from Swiss-Prot
(organism_id:1423)

QUESTION 3.5: How many proteins are there in UniProt from Bacillus subtilis in total (all strains and subspecies)?
43,300, of these 4,279 from Swiss-Prot
(taxonomy_id:1423)

QUESTION 3.6: How many proteins of maximum length 10 do you find?
47,000
(length:[1 TO 10])

QUESTION 3.7: How many proteins are now left?
1,322
(length:[1 TO 10]) AND (existence:1)

QUESTION 3.8: How many proteins are now left?
877
(length:[1 TO 10]) AND (existence:1) AND (fragment:false)

QUESTION 3.9: How many human non-fragment proteins of maximum length 10 do you find in UniProt?
6
(length:[1 TO 10]) AND (existence:1) AND (fragment:false) AND (organism_id:9606)

QUESTION 3.10:

Here they are in FASTA format:

>sp|P0DPR3|TRDD1_HUMAN T cell receptor delta diversity 1 OS=Homo sapiens OX=9606 GN=TRDD1 PE=1 SV=1
EI
>sp|P01858|TUFT_HUMAN Phagocytosis-stimulating peptide OS=Homo sapiens OX=9606 PE=1 SV=1
TKPR
>sp|P02729|GLUR_HUMAN Urine glycopeptide OS=Homo sapiens OX=9606 PE=1 SV=1
CEHSHDGA
>sp|P01358|GAJU_HUMAN Gastric juice peptide 1 OS=Homo sapiens OX=9606 PE=1 SV=1
LAAGKVEDSD
>sp|P02728|GLEM_HUMAN Erythrocyte membrane glycopeptide OS=Homo sapiens OX=9606 PE=1 SV=1
CEGHSHDHGA
>sp|P22103|PNEU_HUMAN Pneumadin OS=Homo sapiens OX=9606 PE=1 SV=1
AGEPKLDAGV

On your own

QUESTION 4.1: (taxonomy_id:562), 775,137 hits.

QUESTION 4.2: (taxonomy_id:83334), 14,589 hits.

QUESTION 4.3: (protein_name:insulin) AND (gene:ins) NOT (protein_name:insulin-like) NOT (protein_name:"insulin related"), 558 hits
(This is a question that does not have one single correct answer)

QUESTION 4.4: (protein_name:"alpha globin") AND (taxonomy_id:9845) NOT (protein_name:"transcription factor"), 17 hits.

QUESTION 4.5: (protein_name:"alpha-* globin") AND (taxonomy_id:8932), 2 hits.