Answers:Malaria Vaccine: Difference between revisions

From 22111
Jump to navigation Jump to search
 
(9 intermediate revisions by 2 users not shown)
Line 43: Line 43:
or  
or  
  (organism_name:"Plasmodium falciparum")
  (organism_name:"Plasmodium falciparum")
both give '''129,616''' hits in total, '''420''' from Swiss-Prot and '''129,196''' from TrEMBL.
both give '''129,611''' hits in total, '''463''' from Swiss-Prot and '''129,148''' from TrEMBL.


If you only found 34,201 hits, it was because you used  
If you only found 34,196 hits, it was because you used  
  (organism_id:5833)
  (organism_id:5833)
which only gives those ''Pf'' proteins that do ''not'' have a specified strain or isolate — cf. question 3.4+3.5 in [[Exercise: The protein database UniProt|the UniProt exercise]].   
which only gives those ''Pf'' proteins that do ''not'' have a specified strain or isolate — cf. question 3.4+3.5 in [[Exercise: The protein database UniProt|the UniProt exercise]].   


If, on the other hand, you found 131,773 hits, it was because you searched in All instead of specifying the search field:
If, on the other hand, you found 131,779 hits, it was because you searched in All instead of specifying the search field:
  Plasmodium falciparum
  Plasmodium falciparum
In that case, you will include some proteins that originate from e.g. humans but play a role in ''Plasmodium falciparum'' infection, which may be mentioned in some comment field or reference title.
In that case, you will include some proteins that originate from e.g. humans but play a role in ''Plasmodium falciparum'' infection, which may be mentioned in some comment field or reference title.
Line 77: Line 77:
<tt>(taxonomy_id:5833) AND (cc_scl_term:SL-0243)</tt>
<tt>(taxonomy_id:5833) AND (cc_scl_term:SL-0243)</tt>


'''36''' (27 from Swiss-Prot).
'''76''' (39 from Swiss-Prot).


===2f)===
===2f)===
Line 86: Line 86:
or<br>
or<br>
<tt>(taxonomy_id:5833) AND (cc_scl_term:SL-0310)</tt><br>
<tt>(taxonomy_id:5833) AND (cc_scl_term:SL-0310)</tt><br>
'''416''' hits
'''413''' hits


membrane:  
membrane:  
Line 93: Line 93:
or<br>
or<br>
<tt>(taxonomy_id:5833) AND (cc_scl_term:SL-0162)</tt><br>
<tt>(taxonomy_id:5833) AND (cc_scl_term:SL-0162)</tt><br>
'''10,111''' hits<br>
'''11,299''' hits<br>


===2g)===
===2g)===
Line 149: Line 149:


===2h)===
===2h)===
'''15''' hits, all from Swiss-Prot.
'''27''' hits, all from Swiss-Prot.


<tt>(taxonomy_id:5833) AND (cc_scl_term:"host cell membrane")</tt><br>
<tt>(taxonomy_id:5833) AND (cc_scl_term:"host cell membrane")</tt><br>
Line 159: Line 159:
<tt>(taxonomy_id:5833) AND (protein_name:erythrocyte)</tt>
<tt>(taxonomy_id:5833) AND (protein_name:erythrocyte)</tt>


'''10,125''', among these only '''4''' from Swiss-Prot.
'''10,119''', among these only '''4''' from Swiss-Prot.


===2j)===
===2j)===
Line 165: Line 165:
<tt>(taxonomy_id:5833) AND (protein_name:erythrocyte) AND (protein_name:membrane)</tt>
<tt>(taxonomy_id:5833) AND (protein_name:erythrocyte) AND (protein_name:membrane)</tt>


'''9,533''' hits, all from TrEMBL.
'''9,527''' hits, all from TrEMBL.


''or''
''or''
Line 171: Line 171:
<tt>(taxonomy_id:5833) AND (protein_name:"erythrocyte membrane")</tt>
<tt>(taxonomy_id:5833) AND (protein_name:"erythrocyte membrane")</tt>


'''9,492''' hits, all from TrEMBL.
'''9,486''' hits, all from TrEMBL.


===2k)===
===2k)===
Line 177: Line 177:
<tt>(taxonomy_id:5833) AND (protein_name:erythrocyte) AND (protein_name:membrane) AND (fragment:false)</tt>
<tt>(taxonomy_id:5833) AND (protein_name:erythrocyte) AND (protein_name:membrane) AND (fragment:false)</tt>


'''2,386''' (or '''2,384''' if the words "erythrocyte membrane" are combined)
'''2,382''' (or '''2,380''' if the words "erythrocyte membrane" are combined)


===2l)===
===2l)===
Line 194: Line 194:
InterPro identifier and name: '''IPR008602, Duffy-antigen binding'''<br>
InterPro identifier and name: '''IPR008602, Duffy-antigen binding'''<br>
Pfam identifier and name: '''PF05424, Duffy binding domain'''<br>
Pfam identifier and name: '''PF05424, Duffy binding domain'''<br>
If we limit the analysis to the three hits whose accession codes begin with "Q" (as mentioned in the announcement from October 10), we find that it is found '''4''' times in Q8IHM0 and '''6''' times in each of Q8I639 and Q6UDW7.
It is found '''4''' times in Q8IHM0 and '''6''' times in each of Q8I639 and Q6UDW7.





Latest revision as of 15:24, 14 October 2024

Answers to case study exercise about malaria vaccines (NB: numbers etc. found in the databases 11-10-2023):

1 - What exactly is malaria?

1a) If you search for "malaria" on NCBIs Taxonomy page, you find some mosquitoes and some protozoans with the Genus name Plasmodium. Clicking the name of one of these (twice) gets you to a page where you can see the lineage:

  • Genus: Plasmodium
  • Phylum: Apicomplexa
  • (Super)Kingdom: Eukaryota


1b) On NCBI's Taxonomy page is a function named ”Taxonomy common tree” which gives a nice overview. Alternatively you can open taxonomy pages for the two organisms to compare, and see on their lineages how much they have in common.

  • Homo sapiens and Plasmodium: Eukaryota
  • Babesia microti and Plasmodium: Aconoidasida

Here is the picture you can get from the "Taxonomy common tree" function:


1c) On CDC's page about malaria or on Tree of Life's page about Plasmodium you find:

  • P. malariae, P. ovale, P. falciparum and P. vivax.

By looking up these four species in NCBI Taxonomy and looking at the table to the right, you see that all four species have a full genome in the databases (see the link named Genome under Entrez records).

 

2 - Identification of membrane proteins (potential vaccine targets)

2a)

14 chromosomes.

5566 not hypothetical genes (search details below)

txid36329[Organism:noexp] NOT hypothetical[All Fields] AND alive[prop]

If you instead found 5570 not hypothetical genes, it is because you found the species Plasmodium falciparum (taxID:5833) in NCBI Taxonomy instead of the specific isolate 3D7 (taxID:36329) as specified in the exercise.

2b)

The correct search strings

(taxonomy_id:5833)

or

(organism_name:"Plasmodium falciparum")

both give 129,611 hits in total, 463 from Swiss-Prot and 129,148 from TrEMBL.

If you only found 34,196 hits, it was because you used

(organism_id:5833)

which only gives those Pf proteins that do not have a specified strain or isolate — cf. question 3.4+3.5 in the UniProt exercise.

If, on the other hand, you found 131,779 hits, it was because you searched in All instead of specifying the search field:

Plasmodium falciparum

In that case, you will include some proteins that originate from e.g. humans but play a role in Plasmodium falciparum infection, which may be mentioned in some comment field or reference title.

2c)

This can be solved in several ways:

  • (taxonomy_id:36329) (either selecting the right isolate from the drop-down menu or using the TaxID you found in the Taxonomy database)
  • (taxonomy_id:5833) AND (organism_name:3d7)
  • (organism_name:"Plasmodium falciparum") AND (organism_name:3d7)

They all give: 5,495 in total, 275 from Swiss-Prot and 5,220 from TrEMBL.

That corresponds approximately to the number of genes found in 2a).

2d)

(taxonomy_id:5833) AND (cc_scl_term:*)

18,931 (370 from Swiss-Prot and 18,561 from TrEMBL).

2e)

(taxonomy_id:5833) AND (cc_scl_term:secreted)

or

(taxonomy_id:5833) AND (cc_scl_term:SL-0243)

76 (39 from Swiss-Prot).

2f)

surface:

(taxonomy_id:5833) AND (cc_scl_term:surface)
or
(taxonomy_id:5833) AND (cc_scl_term:SL-0310)
413 hits

membrane:

(taxonomy_id:5833) AND (cc_scl_term:membrane)
or
(taxonomy_id:5833) AND (cc_scl_term:SL-0162)
11,299 hits

2g)

Potentially useful (found in the cell membrane):

  • Q7KQL9 / ALF_PLAF7 / Fructose-bisphosphate aldolase: ""Host cell membrane"
  • A0A2I0BVG8 / CDPK1_PLAFO / Calcium-dependent protein kinase 1: "Cell membrane"
  • W7KN63 / W7KN63_PLAFO / Merozoite surface antigen 2: "Cell membrane"

Definitely not useful (found in an inner membrane):

  • Q8I6V3 / PLM2_PLAF7 / Plasmepsin II: "Vacuole membrane"
  • U3M186 / U3M186_PLAFA / Cytochrome c oxidase subunit 1: "Mitochondrion inner membrane"
  • O97321 / O97321_PLAF7 / GlcNAc-1-P transferase: "Endoplasmic reticulum membrane"

Of course, the actual examples you selected may differ from these!

2h)

27 hits, all from Swiss-Prot.

(taxonomy_id:5833) AND (cc_scl_term:"host cell membrane")
or
(taxonomy_id:5833) AND (cc_scl_term:SL-0375)

2i)

(taxonomy_id:5833) AND (protein_name:erythrocyte)

10,119, among these only 4 from Swiss-Prot.

2j)

(taxonomy_id:5833) AND (protein_name:erythrocyte) AND (protein_name:membrane)

9,527 hits, all from TrEMBL.

or

(taxonomy_id:5833) AND (protein_name:"erythrocyte membrane")

9,486 hits, all from TrEMBL.

2k)

(taxonomy_id:5833) AND (protein_name:erythrocyte) AND (protein_name:membrane) AND (fragment:false)

2,382 (or 2,380 if the words "erythrocyte membrane" are combined)

2l)

(taxonomy_id:5833) AND (protein_name:erythrocyte) AND (protein_name:membrane) AND (fragment:false) AND (database:pdb)

6 hits: Q6UDW7, Q8I639, W7K270, Q8IHM0, A0A024V5I6, and I1X0L2.

 

3 - Analysis of membrane protein domain structure

3a)

InterPro identifier and name: IPR008602, Duffy-antigen binding
Pfam identifier and name: PF05424, Duffy binding domain
It is found 4 times in Q8IHM0 and 6 times in each of Q8I639 and Q6UDW7.


3b)

The transmembrane segments are in the following positions:

  • Q6UDW7: 2653-2674
  • Q8I639: 2650-2667
  • Q8IHM0: 2695-2717

The extracellular parts are the N-terminal parts (all the positions before the transmembrane segments), and the intracellular (cytoplasmic) parts are C-terminal (positions after the transmembrane segments).


3c)

The following positions are structurally determined by X-ray in the three proteins:

  • Q6UDW7:
    • 1215-1950, covering Duffy_binding domain 3 and 4
    • 1218-1577 or 1220-1580, covering Duffy_binding domain 3
    • 2326-2631 covering Duffy_binding domain 6
  • Q8I639: 2333-2634, covering Duffy_binding domain 6
  • Q8IHM0: 728-1214, covering Duffy_binding domain 2


3d)

Yes!

  • Biological process: cytoadherence to microvasculature, mediated by symbiont protein
  • Biological process: pathogenesis
  • Cellular component: host cell plasma membrane
  • Cellular component: infected host cell surface knob
  • Molecular function: cell adhesion molecule binding
  • Molecular function: host cell surface receptor binding

All these examples support that these proteins are involved in binding the infected erythrocytes to the endothelial cells (as described in the exercise).

Tip: You can click View the complete GO annotation on QuickGO in UniProt.

 

4 - Prediction of B-cell epitopes in a membrane protein

4a) The PDB entry is 2WAU and it's a crystal structure (X-ray).

4b) FASTA sequence for the Duffy Binding domain covered by the 3D structure:

>2WAU_1|Chains A, B|ERYTHROCYTE MEMBRANE PROTEIN 1 (PFEMP1)|PLASMODIUM FALCIPARUM (36329)
ICNKYKNINVNMKKNNDDTWTDLVKNSSDINKGVLLPPRRKNLFLKIDESDICKYKRDPKLFKDFIYSSAISEVERLKKV
YGEAKTKVVHAMKYSFADIGSIIKGDDMMENNSSDKIGKILGDGVGQNEKRKKWWDMNKYHIWESMLSGYKHAYGNISEN
DRKMLDIPNNDDEHQFLRWFQEWTENFCTKRNELYENMVTACNSAKCNTSNGSVDKKECTEACKNYSNFILIKKKEYQSL
NSQYDMNYKETKAEKKESPEYFKDKCNGECSCLSEYFKDETRWKNPYETLDDTEVKNNCMCK


4c) The sequence interval was 2333-2634. This means that the first postion in the new FASTA file corresponds to position 2333 in the original sequence.

4d) It's possible to convert from the coordinates in the FASTA files to the full length sequence by adding 2332. In the table below the epitopes have been named by their starting position as well as numbered.

EPITOPE     POSITIONS    LENGTH     ORIG_POSITIONS
#1 ep_5       5 to  29       25     2337 to 2361
#2 ep_49     49 to  57        9     2381 to 2389
#3 ep_107   107 to 114        8     2439 to 2446
#4 ep_153   153 to 172       20     2485 to 2504
#5 ep_209   209 to 218       10     2541 to 2550
#6 ep_249   249 to 258       10     2581 to 2590
#7 ep_273   273 to 294       22     2605 to 2626

 

5 - Visualization of epitopes

5a) Invisible positions:

Chain A: 2333-2349 and 2540-2546
Chain B: 2333-2348 and 2535-2549

This means that the first epitope (pos 5-29, orig pos 2337 to 2361) and the 5th epitope (pos 209 to 218, orig pos 2541 to 2550) are partially invisible.

5b)

Click to zoom