ExUniProt-answers: Difference between revisions
(Created page with " The numbers are found using UniProt Beta on Sep 12, 2022 ==Simple text mining== '''QUESTION 1.1:''' # How many hits do you find? <br>4,796 # How many of these hits are from Swiss-Prot? <br>1,667 # Can you identify the correct hit (''i.e.'' see which one is actually human insulin and not something else)? <br>It's P01308 / INS_HUMAN (not the top hit, but still on the first page). '''QUESTION 1.2:''' How many hits are now left? How many of these are from Swiss-Prot? <b...") |
|||
Line 6: | Line 6: | ||
'''QUESTION 1.1:''' | '''QUESTION 1.1:''' | ||
# How many hits do you find? <br> | # How many hits do you find? <br>8,061 | ||
# How many of these hits are from Swiss-Prot? <br>1, | # How many of these hits are from Swiss-Prot? <br>1,704 | ||
# Can you identify the correct hit (''i.e.'' see which one is actually human insulin and not something else)? <br>It's P01308 / INS_HUMAN (not the top hit, but still on the first page). | # Can you identify the correct hit (''i.e.'' see which one is actually human insulin and not something else)? <br>It's P01308 / INS_HUMAN (not necessarily the top hit, but still on the first page). | ||
'''QUESTION 1.2:''' How many hits are now left? How many of these are from Swiss-Prot? <br>1, | '''QUESTION 1.2:''' How many hits are now left? How many of these are from Swiss-Prot? <br>1,769 and 1,126 | ||
'''QUESTION 1.3:''' How many hits are now left? How many of these are from Swiss-Prot? <br> | '''QUESTION 1.3:''' How many hits are now left? How many of these are from Swiss-Prot? <br>201 and 60, search string: <tt>(organism_id:9606) AND (protein_name:insulin)</tt> | ||
'''QUESTION 1.4:''' How many hits are now left? How many of these are from Swiss-Prot? <br> | '''QUESTION 1.4:''' How many hits are now left? How many of these are from Swiss-Prot? <br>102 and 25, search string: <tt>(organism_id:9606) AND (protein_name:insulin) NOT (protein_name:insulin-like)</tt> | ||
'''QUESTION 1.5:''' | '''QUESTION 1.5:''' |
Revision as of 15:24, 17 September 2024
The numbers are found using UniProt Beta on Sep 12, 2022
Simple text mining
QUESTION 1.1:
- How many hits do you find?
8,061 - How many of these hits are from Swiss-Prot?
1,704 - Can you identify the correct hit (i.e. see which one is actually human insulin and not something else)?
It's P01308 / INS_HUMAN (not necessarily the top hit, but still on the first page).
QUESTION 1.2: How many hits are now left? How many of these are from Swiss-Prot?
1,769 and 1,126
QUESTION 1.3: How many hits are now left? How many of these are from Swiss-Prot?
201 and 60, search string: (organism_id:9606) AND (protein_name:insulin)
QUESTION 1.4: How many hits are now left? How many of these are from Swiss-Prot?
102 and 25, search string: (organism_id:9606) AND (protein_name:insulin) NOT (protein_name:insulin-like)
QUESTION 1.5:
- How did you do this?
by adding NOT (protein_name:receptor) to the query box. - How many hits are now left? How many of these are from Swiss-Prot?
52 and 16
The contents of UniProt
QUESTION 2.1:
- How many references are there in the insulin entry?
36 - Why do you think insulin is such a highly investigated protein?
Because it is linked to a common and serious disease (diabetes) and used as a drug.
QUESTION 2.2:
- Where do you find insulin?
It is secreted from the cell (this is written just below the section heading. Under GO - Cellular component you can find additional locations mentioned, such as endoplasmic reticulum lumen, but these are temporary stages on the way to secretion). - Why do you think is it found there?
Because it is a hormone - it has to travel through the bloodstream to influence other cells.
QUESTION 2.3: How long is the signal peptide and the propeptide, respectively?
24 and 31 amino acids.
QUESTION 2.4: Which positions are in β-sheet conformation in insulin?
Positions 26-29, 48-50, 56-58, 74-76, and 98-101.
Other databases linked from UniProt
No questions asked here.
Text format
No questions asked here.
Advanced search
QUESTION 3.1: How many proteins did you find, and what was the search string (the text in the search field)?
18,229,549, of these 44,113 from Swiss-Prot
(ft_signal:*)
QUESTION 3.2: How many proteins do you find now, and what has the search string changed into?
3,826, they are all from Swiss-Prot
(ft_signal_exp:*)
Note that the "experimental" evidence is only found in Swiss-Prot entries, not in TrEMBL!
QUESTION 3.3: How many proteins do you find now, and what is the search string?
732
(ft_signal_exp:*) AND (organism_id:9606)
QUESTION 3.4: How many proteins are there in UniProt from Bacillus subtilis with the default TaxID [1423]?
43,759, of these only 62 from Swiss-Prot
(organism_id:1423)
QUESTION 3.5: How many proteins are there in UniProt from Bacillus subtilis in total (all strains and subspecies)?
77,776, of these 4,279 from Swiss-Prot
(taxonomy_id:1423)
QUESTION 3.6: How many proteins of maximum length 10 do you find?
46,956
(length:[1 TO 10])
QUESTION 3.7: How many proteins are now left?
1,317
(length:[1 TO 10]) AND (existence:1)
QUESTION 3.8: How many proteins are now left?
873
(length:[1 TO 10]) AND (existence:1) AND (fragment:false)
QUESTION 3.9: How many human non-fragment proteins of maximum length 10 do you find in UniProt?
6
(length:[1 TO 10]) AND (existence:1) AND (fragment:false) AND (organism_id:9606)
QUESTION 3.10:
Here they are in FASTA format:
>sp|P0DPR3|TRDD1_HUMAN T cell receptor delta diversity 1 OS=Homo sapiens OX=9606 GN=TRDD1 PE=1 SV=1 EI >sp|P01858|TUFT_HUMAN Phagocytosis-stimulating peptide OS=Homo sapiens OX=9606 PE=1 SV=1 TKPR >sp|P02729|GLUR_HUMAN Urine glycopeptide OS=Homo sapiens OX=9606 PE=1 SV=1 CEHSHDGA >sp|P01358|GAJU_HUMAN Gastric juice peptide 1 OS=Homo sapiens OX=9606 PE=1 SV=1 LAAGKVEDSD >sp|P02728|GLEM_HUMAN Erythrocyte membrane glycopeptide OS=Homo sapiens OX=9606 PE=1 SV=1 CEGHSHDHGA >sp|P22103|PNEU_HUMAN Pneumadin OS=Homo sapiens OX=9606 PE=1 SV=1 AGEPKLDAGV
On your own
QUESTION 4.1: (taxonomy_id:562), 1,075,614 hits.
QUESTION 4.2: (taxonomy_id:83334), 14,594 hits.
QUESTION 4.3: (protein_name:insulin) AND (gene:ins) NOT (protein_name:insulin-like) NOT (protein_name:"insulin related"), 545 hits
QUESTION 4.4: (protein_name:"alpha globin") AND (taxonomy_id:9845) NOT (protein_name:"transcription factor"), 17 hits.
QUESTION 4.5: (protein_name:"alpha-* globin") AND (taxonomy_id:8932), 2 hits.