Answers:Malaria Vaccine

From 22111
Jump to navigation Jump to search

Answers to case study exercise about malaria vaccines (NB: numbers etc. found in the databases 11-10-2023):

1 - What exactly is malaria?

1a) If you search for "malaria" on NCBIs Taxonomy page, you find some mosquitoes and some protozoans with the Genus name Plasmodium. Clicking the name of one of these (twice) gets you to a page where you can see the lineage:

  • Genus: Plasmodium
  • Phylum: Apicomplexa
  • (Super)Kingdom: Eukaryota


1b) On NCBI's Taxonomy page is a function named ”Taxonomy common tree” which gives a nice overview. Alternatively you can open taxonomy pages for the two organisms to compare, and see on their lineages how much they have in common.

  • Homo sapiens and Plasmodium: Eukaryota
  • Babesia microti and Plasmodium: Aconoidasida

Here is the picture you can get from the "Taxonomy common tree" function:


1c) On CDC's page about malaria or on Tree of Life's page about Plasmodium you find:

  • P. malariae, P. ovale, P. falciparum and P. vivax.

By looking up these four species in NCBI Taxonomy and looking at the table to the right, you see that all four species have a full genome in the databases (see the link named Genome under Entrez records).

 

2 - Identification of membrane proteins (potential vaccine targets)

2a)

14 chromosomes.

5566 not hypothetical genes (search details below)

txid36329[Organism:noexp] NOT hypothetical[All Fields] AND alive[prop]

If you instead found 5570 not hypothetical genes, it is because you found the species Plasmodium falciparum (taxID:5833) in NCBI Taxonomy instead of the specific isolate 3D7 (taxID:36329) as specified in the exercise.

2b)

The correct search strings

(taxonomy_id:5833)

or

(organism_name:"Plasmodium falciparum")

both give 129,611 hits in total, 463 from Swiss-Prot and 129,148 from TrEMBL.

If you only found 34,196 hits, it was because you used

(organism_id:5833)

which only gives those Pf proteins that do not have a specified strain or isolate — cf. question 3.4+3.5 in the UniProt exercise.

If, on the other hand, you found 131,779 hits, it was because you searched in All instead of specifying the search field:

Plasmodium falciparum

In that case, you will include some proteins that originate from e.g. humans but play a role in Plasmodium falciparum infection, which may be mentioned in some comment field or reference title.

2c)

This can be solved in several ways:

  • (taxonomy_id:36329) (either selecting the right isolate from the drop-down menu or using the TaxID you found in the Taxonomy database)
  • (taxonomy_id:5833) AND (organism_name:3d7)
  • (organism_name:"Plasmodium falciparum") AND (organism_name:3d7)

They all give: 5,495 in total, 275 from Swiss-Prot and 5,220 from TrEMBL.

That corresponds approximately to the number of genes found in 2a).

2d)

(taxonomy_id:5833) AND (cc_scl_term:*)

18,931 (370 from Swiss-Prot and 18,561 from TrEMBL).

2e)

(taxonomy_id:5833) AND (cc_scl_term:secreted)

or

(taxonomy_id:5833) AND (cc_scl_term:SL-0243)

76 (39 from Swiss-Prot).

2f)

surface:

(taxonomy_id:5833) AND (cc_scl_term:surface)
or
(taxonomy_id:5833) AND (cc_scl_term:SL-0310)
413 hits

membrane:

(taxonomy_id:5833) AND (cc_scl_term:membrane)
or
(taxonomy_id:5833) AND (cc_scl_term:SL-0162)
11,299 hits

2g)

Potentially useful (found in the cell membrane):

  • Q7KQL9 / ALF_PLAF7 / Fructose-bisphosphate aldolase: ""Host cell membrane"
  • A0A2I0BVG8 / CDPK1_PLAFO / Calcium-dependent protein kinase 1: "Cell membrane"
  • W7KN63 / W7KN63_PLAFO / Merozoite surface antigen 2: "Cell membrane"

Definitely not useful (found in an inner membrane):

  • Q8I6V3 / PLM2_PLAF7 / Plasmepsin II: "Vacuole membrane"
  • U3M186 / U3M186_PLAFA / Cytochrome c oxidase subunit 1: "Mitochondrion inner membrane"
  • O97321 / O97321_PLAF7 / GlcNAc-1-P transferase: "Endoplasmic reticulum membrane"

Of course, the actual examples you selected may differ from these!

2h)

27 hits, all from Swiss-Prot.

(taxonomy_id:5833) AND (cc_scl_term:"host cell membrane")
or
(taxonomy_id:5833) AND (cc_scl_term:SL-0375)

2i)

(taxonomy_id:5833) AND (protein_name:erythrocyte)

10,119, among these only 4 from Swiss-Prot.

2j)

(taxonomy_id:5833) AND (protein_name:erythrocyte) AND (protein_name:membrane)

9,527 hits, all from TrEMBL.

or

(taxonomy_id:5833) AND (protein_name:"erythrocyte membrane")

9,486 hits, all from TrEMBL.

2k)

(taxonomy_id:5833) AND (protein_name:erythrocyte) AND (protein_name:membrane) AND (fragment:false)

2,382 (or 2,380 if the words "erythrocyte membrane" are combined)

2l)

(taxonomy_id:5833) AND (protein_name:erythrocyte) AND (protein_name:membrane) AND (fragment:false) AND (database:pdb)

6 hits: Q6UDW7, Q8I639, W7K270, Q8IHM0, A0A024V5I6, and I1X0L2.

 

3 - Analysis of membrane protein domain structure

3a)

InterPro identifier and name: IPR008602, Duffy-antigen binding
Pfam identifier and name: PF05424, Duffy binding domain
It is found 4 times in Q8IHM0 and 6 times in each of Q8I639 and Q6UDW7.


3b)

The transmembrane segments are in the following positions:

  • Q6UDW7: 2653-2674
  • Q8I639: 2650-2667
  • Q8IHM0: 2695-2717

The extracellular parts are the N-terminal parts (all the positions before the transmembrane segments), and the intracellular (cytoplasmic) parts are C-terminal (positions after the transmembrane segments).


3c)

The following positions are structurally determined by X-ray in the three proteins:

  • Q6UDW7:
    • 1215-1950, covering Duffy_binding domain 3 and 4
    • 1218-1577 or 1220-1580, covering Duffy_binding domain 3
    • 2326-2631 covering Duffy_binding domain 6
  • Q8I639: 2333-2634, covering Duffy_binding domain 6
  • Q8IHM0: 728-1214, covering Duffy_binding domain 2


3d)

Yes!

  • Biological process: cytoadherence to microvasculature, mediated by symbiont protein
  • Biological process: pathogenesis
  • Cellular component: host cell plasma membrane
  • Cellular component: infected host cell surface knob
  • Molecular function: cell adhesion molecule binding
  • Molecular function: host cell surface receptor binding

All these examples support that these proteins are involved in binding the infected erythrocytes to the endothelial cells (as described in the exercise).

Tip: You can click View the complete GO annotation on QuickGO in UniProt.

 

4 - Prediction of B-cell epitopes in a membrane protein

4a) The PDB entry is 2WAU and it's a crystal structure (X-ray).

4b) FASTA sequence for the Duffy Binding domain covered by the 3D structure:

>2WAU_1|Chains A, B|ERYTHROCYTE MEMBRANE PROTEIN 1 (PFEMP1)|PLASMODIUM FALCIPARUM (36329)
ICNKYKNINVNMKKNNDDTWTDLVKNSSDINKGVLLPPRRKNLFLKIDESDICKYKRDPKLFKDFIYSSAISEVERLKKV
YGEAKTKVVHAMKYSFADIGSIIKGDDMMENNSSDKIGKILGDGVGQNEKRKKWWDMNKYHIWESMLSGYKHAYGNISEN
DRKMLDIPNNDDEHQFLRWFQEWTENFCTKRNELYENMVTACNSAKCNTSNGSVDKKECTEACKNYSNFILIKKKEYQSL
NSQYDMNYKETKAEKKESPEYFKDKCNGECSCLSEYFKDETRWKNPYETLDDTEVKNNCMCK


4c) The sequence interval was 2333-2634. This means that the first postion in the new FASTA file corresponds to position 2333 in the original sequence.

4d) It's possible to convert from the coordinates in the FASTA files to the full length sequence by adding 2332. In the table below the epitopes have been named by their starting position as well as numbered.

EPITOPE     POSITIONS    LENGTH     ORIG_POSITIONS
#1 ep_5       5 to  29       25     2337 to 2361
#2 ep_49     49 to  57        9     2381 to 2389
#3 ep_107   107 to 114        8     2439 to 2446
#4 ep_153   153 to 172       20     2485 to 2504
#5 ep_209   209 to 218       10     2541 to 2550
#6 ep_249   249 to 258       10     2581 to 2590
#7 ep_273   273 to 294       22     2605 to 2626

 

5 - Visualization of epitopes

5a) Invisible positions:

Chain A: 2333-2349 and 2540-2546
Chain B: 2333-2348 and 2535-2549

This means that the first epitope (pos 5-29, orig pos 2337 to 2361) and the 5th epitope (pos 209 to 218, orig pos 2541 to 2550) are partially invisible.

5b)

Click to zoom