ExUniProt-answers
Answers to "Exercise: Protein databases"
The numbers are found using UniProt on Feb 10, 2017 (release 2017_01).
Simple text mining
QUESTION 1:
- How many hits do you find?
3150 - How many of these hits are from Swiss-Prot?
1254 - Can you identify the correct hit (i.e. see which one is actually human insulin and not something else)?
It's P01308 / INS_HUMAN (among the first ten hits).
QUESTION 2: How many hits are now left? How many of these are from Swiss-Prot?
1298 and 895
QUESTION 3: How many hits are now left? How many of these are from Swiss-Prot?
195 and 60
QUESTION 4: How many hits are now left?
100
QUESTION 5:
- How did you do this?
by adding NOT name:receptor to the query box. - How many hits are now left?
48
The contents of UniProt
QUESTION 6:
- How many references are there in the insulin entry?
36 - Why do you think insulin is such a highly investigated protein?
Because it is linked to a common and serious disease (diabetes) and used as a drug.
QUESTION 7:
- Where do you find insulin?
It is secreted from the cell (this is written just below the section heading. Under GO - Cellular component you can find additional locations mentioned, such as endoplasmic reticulum lumen, but these are temporary stages on the way to secretion). - Why do you think is it found there?
Because it is a hormone - it has to travel through the bloodstream to influence other cells.
QUESTION 8: How long is the signal peptide and the propeptide, respectively?
24 and 31 amino acids.
QUESTION 9: Which positions are in β-sheet conformation in insulin?
Positions 26-29, 48-50, 74-76, and 98-101.
Other databases linked from Swiss-Prot
No questions asked here.
Advanced search
QUESTION 10: How many proteins did you find, and what was the search string (the text in the search field)?
5,186,371
annotation:(type:signal)
QUESTION 11: How many proteins do you find now, and what has the search string changed into?
3486
annotation:(type:signal evidence:experimental)
QUESTION 12: How many proteins do you find now, and what is the search string?
707
annotation:(type:signal evidence:experimental) AND organism:"Homo sapiens (Human) [9606]"
QUESTION 13 a: How many proteins are there in UniProt from Neisseria gonorrhoeae with the default TaxID [485]?
9203
QUESTION 13 b: How many proteins are there in UniProt from Neisseria gonorrhoeae in total (all strains and subspecies)?
18,596 (twice as many)
QUESTION 13 c: What does the search string look like now?
taxonomy:"Neisseria gonorrhoeae [485]".
QUESTION 14: How many proteins of maximum length 10 do you find?
32,090
length:[1 TO 10]
QUESTION 15: How many proteins are now left?
1280
length:[1 TO 10] existence:"evidence at protein level"
QUESTION 16: How many proteins are now left?
830
length:[1 TO 10] existence:"evidence at protein level" fragment:no
QUESTION 17: How many human non-fragment proteins of maximum length 10 do you find in UniProt?
5
length:[1 TO 10] existence:"evidence at protein level" fragment:no AND organism:"Human [9606]"
QUESTION 18: Here they are in FASTA format:
>sp|P01358|GAJU_HUMAN Gastric juice peptide 1 OS=Homo sapiens PE=1 SV=1 LAAGKVEDSD >sp|P02728|GLEM_HUMAN Erythrocyte membrane glycopeptide OS=Homo sapiens PE=1 SV=1 CEGHSHDHGA >sp|P02729|GLUR_HUMAN Urine glycopeptide OS=Homo sapiens PE=1 SV=1 CEHSHDGA >sp|P22103|PNEU_HUMAN Pneumadin OS=Homo sapiens PE=1 SV=1 AGEPKLDAGV >sp|P01858|TUFT_HUMAN Phagocytosis-stimulating peptide OS=Homo sapiens PE=1 SV=1 TKPR