ExUniProt-answers: Difference between revisions
No edit summary |
|||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
The numbers are found using UniProt | The numbers are found using UniProt on Sep 17, 2024 | ||
==Simple text mining== | ==Simple text mining== | ||
Line 44: | Line 44: | ||
==Advanced search== | ==Advanced search== | ||
'''QUESTION 3.1:''' How many proteins did you find, and what was the search string (the text in the search field)? <br> | '''QUESTION 3.1:''' How many proteins did you find, and what was the search string (the text in the search field)? <br>17,883,532, of these 44,569 from Swiss-Prot<br><tt>(ft_signal:*)</tt> | ||
'''QUESTION 3.2:''' How many proteins do you find now, and what has the search string changed into? <br>3, | '''QUESTION 3.2:''' How many proteins do you find now, and what has the search string changed into? <br>3,854, they are ''all'' from Swiss-Prot<br><tt>(ft_signal_exp:*)</tt><br>Note that the "experimental" evidence is only found in Swiss-Prot entries, not in TrEMBL! | ||
'''QUESTION 3.3:''' How many proteins do you find now, and what is the search string? <br> | '''QUESTION 3.3:''' How many proteins do you find now, and what is the search string? <br>734<br><tt>(ft_signal_exp:*) AND (organism_id:9606)</tt> | ||
'''QUESTION 3.4:''' How many proteins are there in UniProt from ''Bacillus subtilis'' with the default TaxID [1423]? <br> | '''QUESTION 3.4:''' How many proteins are there in UniProt from ''Bacillus subtilis'' with the default TaxID [1423]? <br>18,684 results, of these only 62 from Swiss-Prot<br><tt>(organism_id:1423)</tt> | ||
'''QUESTION 3.5:''' How many proteins are there in UniProt from ''Bacillus subtilis'' in total (all strains and subspecies)? <br> | '''QUESTION 3.5:''' How many proteins are there in UniProt from ''Bacillus subtilis'' in total (all strains and subspecies)? <br>43,300, of these 4,279 from Swiss-Prot<br><tt>(taxonomy_id:1423)</tt> | ||
'''QUESTION 3.6:''' How many proteins of maximum length 10 do you find? <br> | '''QUESTION 3.6:''' How many proteins of maximum length 10 do you find? <br>47,000 <br><tt>(length:[1 TO 10])</tt> | ||
'''QUESTION 3.7:''' How many proteins are now left? <br>1, | '''QUESTION 3.7:''' How many proteins are now left? <br>1,322 <br><tt>(length:[1 TO 10]) AND (existence:1)</tt> | ||
'''QUESTION 3.8:''' How many proteins are now left? <br> | '''QUESTION 3.8:''' How many proteins are now left? <br>877 <br><tt>(length:[1 TO 10]) AND (existence:1) AND (fragment:false)</tt> | ||
'''QUESTION 3.9:''' How many human non-fragment proteins of maximum length 10 do you find in UniProt? <br>6 <br><tt>(length:[1 TO 10]) AND (existence:1) AND (fragment:false) AND (organism_id:9606)</tt> | '''QUESTION 3.9:''' How many human non-fragment proteins of maximum length 10 do you find in UniProt? <br>6 <br><tt>(length:[1 TO 10]) AND (existence:1) AND (fragment:false) AND (organism_id:9606)</tt> | ||
Line 81: | Line 81: | ||
== On your own == | == On your own == | ||
QUESTION 4.1: <tt>(taxonomy_id:562)</tt>, | QUESTION 4.1: <tt>(taxonomy_id:562)</tt>, 775,137 hits. | ||
QUESTION 4.2: <tt>(taxonomy_id:83334)</tt>, 14, | QUESTION 4.2: <tt>(taxonomy_id:83334)</tt>, 14,589 hits. | ||
QUESTION 4.3: <tt>(protein_name:insulin) AND (gene:ins) NOT (protein_name:insulin-like) NOT (protein_name:"insulin related")</tt>, | QUESTION 4.3: <tt>(protein_name:insulin) AND (gene:ins) NOT (protein_name:insulin-like) NOT (protein_name:"insulin related")</tt>, 558 hits<br> | ||
(This is a question that does not have one single correct answer) | |||
QUESTION 4.4: <tt>(protein_name:"alpha globin") AND (taxonomy_id:9845) NOT (protein_name:"transcription factor")</tt>, 17 hits. | QUESTION 4.4: <tt>(protein_name:"alpha globin") AND (taxonomy_id:9845) NOT (protein_name:"transcription factor")</tt>, 17 hits. | ||
QUESTION 4.5: <tt>(protein_name:"alpha-* globin") AND (taxonomy_id:8932)</tt>, 2 hits. | QUESTION 4.5: <tt>(protein_name:"alpha-* globin") AND (taxonomy_id:8932)</tt>, 2 hits. |
Latest revision as of 15:54, 17 September 2024
The numbers are found using UniProt on Sep 17, 2024
Simple text mining
QUESTION 1.1:
- How many hits do you find?
8,061 - How many of these hits are from Swiss-Prot?
1,704 - Can you identify the correct hit (i.e. see which one is actually human insulin and not something else)?
It's P01308 / INS_HUMAN (not necessarily the top hit, but still on the first page).
QUESTION 1.2: How many hits are now left? How many of these are from Swiss-Prot?
1,769 and 1,126
QUESTION 1.3: How many hits are now left? How many of these are from Swiss-Prot?
201 and 60, search string: (organism_id:9606) AND (protein_name:insulin)
QUESTION 1.4: How many hits are now left? How many of these are from Swiss-Prot?
102 and 25, search string: (organism_id:9606) AND (protein_name:insulin) NOT (protein_name:insulin-like)
QUESTION 1.5:
- How did you do this?
by adding NOT (protein_name:receptor) to the query box. - How many hits are now left? How many of these are from Swiss-Prot?
52 and 16
The contents of UniProt
QUESTION 2.1:
- How many references are there in the insulin entry?
36 - Why do you think insulin is such a highly investigated protein?
Because it is linked to a common and serious disease (diabetes) and used as a drug.
QUESTION 2.2:
- Where do you find insulin?
It is secreted from the cell (this is written just below the section heading. Under GO - Cellular component you can find additional locations mentioned, such as endoplasmic reticulum lumen, but these are temporary stages on the way to secretion). - Why do you think is it found there?
Because it is a hormone - it has to travel through the bloodstream to influence other cells.
QUESTION 2.3: How long is the signal peptide and the propeptide, respectively?
24 and 31 amino acids.
QUESTION 2.4: Which positions are in β-sheet conformation in insulin?
Positions 26-29, 48-50, 56-58, 74-76, and 98-101.
Other databases linked from UniProt
No questions asked here.
Text format
No questions asked here.
Advanced search
QUESTION 3.1: How many proteins did you find, and what was the search string (the text in the search field)?
17,883,532, of these 44,569 from Swiss-Prot
(ft_signal:*)
QUESTION 3.2: How many proteins do you find now, and what has the search string changed into?
3,854, they are all from Swiss-Prot
(ft_signal_exp:*)
Note that the "experimental" evidence is only found in Swiss-Prot entries, not in TrEMBL!
QUESTION 3.3: How many proteins do you find now, and what is the search string?
734
(ft_signal_exp:*) AND (organism_id:9606)
QUESTION 3.4: How many proteins are there in UniProt from Bacillus subtilis with the default TaxID [1423]?
18,684 results, of these only 62 from Swiss-Prot
(organism_id:1423)
QUESTION 3.5: How many proteins are there in UniProt from Bacillus subtilis in total (all strains and subspecies)?
43,300, of these 4,279 from Swiss-Prot
(taxonomy_id:1423)
QUESTION 3.6: How many proteins of maximum length 10 do you find?
47,000
(length:[1 TO 10])
QUESTION 3.7: How many proteins are now left?
1,322
(length:[1 TO 10]) AND (existence:1)
QUESTION 3.8: How many proteins are now left?
877
(length:[1 TO 10]) AND (existence:1) AND (fragment:false)
QUESTION 3.9: How many human non-fragment proteins of maximum length 10 do you find in UniProt?
6
(length:[1 TO 10]) AND (existence:1) AND (fragment:false) AND (organism_id:9606)
QUESTION 3.10:
Here they are in FASTA format:
>sp|P0DPR3|TRDD1_HUMAN T cell receptor delta diversity 1 OS=Homo sapiens OX=9606 GN=TRDD1 PE=1 SV=1 EI >sp|P01858|TUFT_HUMAN Phagocytosis-stimulating peptide OS=Homo sapiens OX=9606 PE=1 SV=1 TKPR >sp|P02729|GLUR_HUMAN Urine glycopeptide OS=Homo sapiens OX=9606 PE=1 SV=1 CEHSHDGA >sp|P01358|GAJU_HUMAN Gastric juice peptide 1 OS=Homo sapiens OX=9606 PE=1 SV=1 LAAGKVEDSD >sp|P02728|GLEM_HUMAN Erythrocyte membrane glycopeptide OS=Homo sapiens OX=9606 PE=1 SV=1 CEGHSHDHGA >sp|P22103|PNEU_HUMAN Pneumadin OS=Homo sapiens OX=9606 PE=1 SV=1 AGEPKLDAGV
On your own
QUESTION 4.1: (taxonomy_id:562), 775,137 hits.
QUESTION 4.2: (taxonomy_id:83334), 14,589 hits.
QUESTION 4.3: (protein_name:insulin) AND (gene:ins) NOT (protein_name:insulin-like) NOT (protein_name:"insulin related"), 558 hits
(This is a question that does not have one single correct answer)
QUESTION 4.4: (protein_name:"alpha globin") AND (taxonomy_id:9845) NOT (protein_name:"transcription factor"), 17 hits.
QUESTION 4.5: (protein_name:"alpha-* globin") AND (taxonomy_id:8932), 2 hits.