IEDB

From 22145
Jump to navigation Jump to search

Immune Epitope DataBase (IEDB) Exercises


The Immune Epitope DataBase (IEDB) is a very useful resource for researchers of immunology and immunoinformatics. It is a highly organized database of epitopes(or more precisely: epitope assay results) that has a user friendly search interface. It is maintained by the La Jolla Institute for Allergy and Immunology and has a dedicated team of curators and bioinformatitians that have mined the immunological literature. The IEDB is up to date with the immunological literature (within its scope) and is therefore a good place to get an idea of what has been studied and to what extent. Aside from this, they provide prediction tools that can be very useful when no experimental evidence exists. Best of all, it's all free (for academics)!

Today we will focus on building experience in querying the IEDB, so that you may search for data in your own research. Start by going to the IEDB home page.

Sequence Query

The goal here is to introduce you to the basic functionalities of the IEDB.

In the home page search (center field on homepage), in the ’Epitope’ field, select ’Linear Epitope’, ’Exact Match’ and enter in the text box: ’ASNENMETM’. Leave remaining parameters as default and select ’Search’ in the bottom of the page.

  • Q1: How many epitopes do you find?
  • Q2: From which antigen molecule do the epitopes come from? What about the antigen organism?

Under the ’Epitopes’ search tab, inspect the column ’# References’ and note that the output table is sorted by this column. Click the column header names to change sorting order. Sort again by number of references and click the ’Details’ number for the epitope with most references. This leads you to a summary page for said epitope. Note the tabulated T Cell Assay Results.

  • Q3: What is the number of positive results for the ’qualitative binding’ assay? How about for ’pathogen burden after challenge’?

Back on the top of the results page, click around the different tabs(’Epitopes’, ’Antigen’,’Assays’...) for different vantage points of the results.

  • Q4: How many entries are under the Assay tab?

The ’Assay’ tab result can be further divided by tabs right above the results table(’T Cell Assays’, ’B Cell Assays’, ’MHC Ligand Assays’).

  • Q5: How many ’MHC Ligand Assays’ have been performed on the epitopes in question?

Applied filters are shown under ’Current Filters’ at the top of the results page. Try removing the ’Positive Assays Only’ filter by clicking the red ’X’ and pressing ’Search’ (Top Left). You have now included negative results in your search.

  • Q6: How many negative ’MHC Ligand Assays’ are added by doing this?

You can add the ’Positive Assays Only’ filter back by refining your search. On the left side of the results page note the search fields(’Epitope’, ’Antigen’, ’Receptor’, ’Assay’...). Scroll down to the ’Assay’ field and select ’Positive Assays Only’. Scroll back to the top and select click ’Search’.

  • Q7: What is the ’MHC Ligand Assay’ count now? Does it match your initial count?

Finally, inspect the results under the ’Reference’ tab. Click the PMID column entry for any row. This is a PubMed link to the reference. Back in the result table, sort the resulting references by the date by clicking the ’Date’ column header.

  • Q8: What is the time span of publications for the epitopes?

Broaden your search. In the ’Epitope’ search field on the left of the results page, change the ’Exact Match’ search to a BLAST search with a 70% sequence homology.

  • Q9: How many Epitopes does your search yield?

Pro tip: Start with a broad initial search(e.g. all T Cell epitopes) from the home page and then add filters in steps(in the search field on left of results page). This way you can get a feel for which filter is most restrictive(where do you lose most epitopes). This is also helpful for debugging a wrong search.

Neutralizing Ebola Antibodies

The goal here is to get a clinically interesting dataset by a few clicks on the IEDB.

Starting from the home page search, in the field, select only ’Positive assays only’ and ’B cell assays’. Leave rest of parameters on default and click 'Search'.

  • Q10: This should return a large amount of epitopes, how many?

Start refining your search using the search fields on the left of the results page. In the ’Organism’ box of the ’Antigen’ field search for ’Ebolavirus (ID:186538, ID:186539, ID:186540, ID:186541)’. You can also try searching for ’Ebolavirus’ in the organism finder in the ’Antigen’ field. Click ’Search’ (Top Left) again to add this filter.

  • Q11: How many Ebola virus epitopes do you see?

Further refine your search by looking for only Ebola epitopes reactive in human cells. In the ’Host’ search field, select ’Human’ and click 'Search' again.

  • Q12: How many Human reactive Ebola virus epitopes results do you see?

Now, let’s filter the ebola epitope set down to only epitopes that have been positively shown to result in neutralization in a human host. In the ’Assay’ search field, select the ’B Cell Assay’ finder. In the assay finder, expand the ’Biological Activity’ folder and click ’Neutralization’, ’Apply’ and then search to add this last filter.

  • Q13: How many epitopes remain?

You now have a highly filtered set of epitopes with clinical interest. In the results page, click the ’Assay’ tab, find the entry ID ’3218080’ and click it. This is an example of a monoclonal antibody that has shown promise in the treatment of Ebola in humans. Remember that you can export your search results to a .csv format for later use.

Pro Tip 2: When you have your dataset of interest ready for export, note the filters you have applied at the top of the results page so that you may recreate the dataset later. Also, the results tab you export from influences what you get.

Pro Tip 3: You may encounter a ’Bad Request’ error in your browser when working on the IEDB website. Try clearing your recent browser history (cookies and cached data).

Population Coverage of a SARS-CoV2 Peptide Vaccine

We have all been influenced by the coronavirus disease 2019 (COVID-19) pandemic, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Fortunately, vaccines have been developed and approved in record time. The currently approved mRNA vaccines of Pfizer and Moderna code for the SARS-CoV-2 Spike Protein, a viral fusion protein that is important for cell adhesion and the induction of protective immunity.

This and the next exercise will be reminiscent of the HCV vaccine development exercise, but we will work with tools from the IEDB and target the SARS-CoV2 spike protein. Here we want to investigate the importance of population coverage in rational peptide vaccine design. To this end, we will work with data from the IEDB and use the Population Coverage Analysis Tool.

From the IEDB home page, search for MHC I restricted T-cell Epitopes in humans from the SARS-Cov-2 Spike glycoprotein antigen.

  • Q14: How many assay results do you find?

In the Antigens tab, select the immunome browser for a mapping of all assay results onto the spike glycoprotein. Observing the "Epitope Assay Counts" plots, we note that most of the protein sequence has been tested. Let us select an epitope subset that has been repeatedly tested with positive assay results(high response frequency). Export the immunome browser results, (and with excel/R/python/...) filter to only include 9-mer peptides, and sort on the lower confidence interval of the Response Frequency.

  • Q15: What 9-mer has the highest, and most reliable Response Frequency?

We will select the top 10 peptides with the highest response frequency as our peptide vaccine candidates. Next we want to test the population coverage of these 10 peptides using the population coverage tool. Population Coverage requires not only the epitope sequences but also by which MHCs they are presented. I have prepared this dataset by predicting binding of the peptides(with NetMHCpan-4.1) to a set of prevalent MHCs and for each peptide, only list the MHCs that bind them strongly(rank<0.5). This data is available here: PopulationCoverageInput

To open the Population Coverage tool: on the IEDB home page, under "Analysis Resource" select "Epitope Analysis Tools" and then "Population Coverage".

Try computing coverage of the 10 peptides to the European population, selecting "Class I seperate" and entering the peptides using Choose File.

  • Q16: What percentage of the European population does the set of peptides cover?

Select "View chart data in table format".

  • Q17: What percentage of the european population has 0 epitope/HLA hits?

Select "View coverage of individual epitope in Europe"

  • Q18: How many Epitopes have more than 60% coverage of the population, each?

Rerun the Population Coverage computation with only the epitopes that each have more than 60% coverage of the European population.

  • Q19: What coverage do you reach for the European population?

Rerun the Population Coverage computation with the same subset, but for the Central African population

  • Q20: What coverage do you reach for the Central African population?

Hopefully this exercise has solidified the idea that one aspect of rational vaccine design is the balancing act between population coverage and cost/complexity/number of peptides.

Epitope Conservancy Analysis of a SARS-CoV2 Peptide Vaccine

Another concern when designing a peptide vaccine is the divergence of pathogen strains. A peptide vaccine can achieve excellent population coverage for a single pathogen strain, but a few mutations can render it useless. Therefore we must be aware of the genotypic landscape of current pathogen strains to test the robustness of a vaccine.

Using our epitope set from the previous questions, we want to test the Epitope Conservancy against a recently compiled set of known SARS-CoV-2 mutations[Xu et. al. Sept. 2020: Variations in SARS-CoV-2 Spike Protein Cell Epitopes and Glycosylation Profiles During Global Transmission Course of COVID-19]. I have compiled SARS-CoV-2 Spike Protein sequence variants (from reference sequence: QHD43416.1), containing 16 frequent mutations in each country untill April 26, 2020. This Fasta File can be found here. SpikeProteinVariants

To open the Epitope Concervancy tool: on the IEDB home page, under "Analysis Resource" select "Epitope Analysis Tools" and then "Epitope Conservancy Analysis".

We will use a larger set of peptides as input(EpitopeConservancyPeptides). Input the peptide sequences and the spike protein variant sequences and select "Epitope linear sequence conservancy" and then submit.

  • Q21: Which Epitope has the lowest number of protein sequence matches at 100%?
  • Q22: Do you think pathogen coverage will be a problem for a vaccine containing these epitopes? Why?

The variants were gathered from www.GISAID.org, an organization devoted to sharing data on coronaviruses causing COVID-19. Their front page has a plot showing the number of data submissions: "hCoV-19 Data Sharing via GISAID".

  • Q23: How many times more submissions have been made since April 26th, 2020 (variants in this exercise)?

This indicates that it might be worth repeating this analysis with a more up to date dataset. It could also be interesting to test if variants have emerged whose mutations eliminate epitopes.


You should now have a fairly good grasp of the IEDB. It is now your job to come up with some research questions that you can phrase in the form of an IEDB query.

Done!