Microbial genomics exercise: Difference between revisions

From 22126
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
== EX03: What sequence type is the genome? ==


In addition to species identification, sequence typing is commonly used in clinical microbiology to compare isolates and support outbreak investigations.


Dear course participants,
'''Multilocus Sequence Typing (MLST)''' assigns isolates to a sequence type (ST) based on the allelic profiles of a defined set of housekeeping genes.


In this exercise you will analyse microbial genome sequences using bioinformatics tools that are commonly used for microbial diagnostics and research.
The assembled genomes are available in:
 
<code>/home/projects/microbial_genomics/ex02_assemblies</code>
 
Run the MLST tool to determine the sequence type of each genome.
 
Start by inspecting the available options:
 
<pre>
mlst.sh -h
</pre>
 
Then run MLST on the genome assemblies:


The tools are available at the server as Apptainer container images.
Although the image files are located in <code>/home/projects/microbial_genomics/singularity_image_files</code>, you do not need to call them directly. Instead, you can run the tools using the provided BASH executables in <code>/home/ctools/bin</code>, which is already available via your <code>$PATH</code>.
<pre>
<pre>
gtdbtk.sh
mlst.sh /home/projects/microbial_genomics/ex02_assemblies/*.fna
mlst.sh
parsnp.sh
abricate.sh
</pre>
</pre>


You can use
'''Questions:'''
* What MLST scheme is used for each genome?
* What sequence type (ST) is assigned to each isolate?
* Are all genomes assigned to the same ST?
 
---
 
== EX04: Which antimicrobial resistance genes are present? ==


Detection of antimicrobial resistance (AMR) genes is an important part of microbial diagnostics.


== Background ==
The tool '''ABRicate''' can be used to screen genome assemblies against curated resistance gene databases.


We imagine that we are employed at a hospital to provide diagnostics for patient care.
Run ABRicate on the assembled genomes using a resistance gene database.


The laboratory has sequenced genomic DNA from a clinical specimen. The sequence reads are stored in FASTQ files that are compressed with gzip:
First, inspect the available options and databases:


* X
<pre>
* Y
abricate.sh -h
abricate.sh --list
</pre>


Use the following command to read the first lines of one of the files and inspect its content:
Then screen the genomes using the ResFinder database:


<pre>
<pre>
zcat filename.fastq.gz | head
abricate.sh --db resfinder /home/projects/microbial_genomics/ex02_assemblies/*.fna
</pre>
</pre>


== EX02: What species is the genome ? ==
'''Questions:'''
* Which antimicrobial resistance genes are detected in each genome?
* Are the resistance profiles identical across the isolates?
* Based on the detected genes, which antibiotic classes might be ineffective?
 
---


The laboratory has sequenced genomic DNA from single-colony isolates of bacteria cultivated from a clinical specimen. The sequence reads have been ''de novo'' assembled, and the genome assemblies are stored in FASTA-formatted files available in <code>/home/projects/microbial_genomics/genome_assemblies</code>.
== EX05: How are the isolates related? ==


Your want to determine the bacterial species of the assembled genomes.
Whole-genome comparisons are frequently used to assess the relatedness of bacterial isolates, for example during suspected outbreaks.


We can use the [https://github.com/Ecogenomics/GTDBTk GTDB-Tk tool] to assign taxonomic classifications to bacterial genomes based on the [https://gtdb.ecogenomic.org Genome Database Taxonomy (GTDB)].
The tool '''Parsnp''' performs core-genome alignment and identifies single nucleotide polymorphisms (SNPs) between closely related genomes.


Run <code>gtdbtk.sh -h</code> to get help information on how to use GTDB-Tk.
Use Parsnp to compare the assembled genomes.


Use GTDB-Tk to determine the species of the genomes:
First, view the help information:


<pre>
<pre>
gtdbtk.sh classify_wf --extension .fna --cpus 10 --genome_dir /home/projects/microbial_genomics/ex02_assemblies --out_dir $HOME/output
parsnp.sh -h
</pre>
</pre>


'''Question: What species are the genomes ?'''
Then run Parsnp using one genome as the reference:
 
 
== EX03: What sequence type is the genome ? ==


<pre>
parsnp.sh -r /home/projects/microbial_genomics/ex02_assemblies/genome1.fna \
          -d /home/projects/microbial_genomics/ex02_assemblies \
          -o $HOME/parsnp_out
</pre>


Parsnp produces a core-genome alignment and a phylogenetic tree.


'''Questions:'''
* How many SNPs separate the isolates?
* Do the genomes cluster closely together?
* Based on the results, do the isolates appear to be clonally related?


---


== Summary ==


In this exercise you have:


== Supplementary files ==
* Identified the species of bacterial genomes using '''GTDB-Tk'''
* Determined sequence types using '''MLST'''
* Screened for antimicrobial resistance genes using '''ABRicate'''
* Assessed genomic relatedness using '''Parsnp'''


[https://teaching.healthtech.dtu.dk/22126/images/c/cd/Gtdbtk.pdf Article describing GTDB-Tk can be found here]
Together, these analyses reflect a typical bioinformatics workflow used in microbial diagnostics and epidemiological investigations.

Revision as of 14:34, 6 January 2026

EX03: What sequence type is the genome?

In addition to species identification, sequence typing is commonly used in clinical microbiology to compare isolates and support outbreak investigations.

Multilocus Sequence Typing (MLST) assigns isolates to a sequence type (ST) based on the allelic profiles of a defined set of housekeeping genes.

The assembled genomes are available in:

/home/projects/microbial_genomics/ex02_assemblies

Run the MLST tool to determine the sequence type of each genome.

Start by inspecting the available options:

mlst.sh -h

Then run MLST on the genome assemblies:

mlst.sh /home/projects/microbial_genomics/ex02_assemblies/*.fna

Questions:

  • What MLST scheme is used for each genome?
  • What sequence type (ST) is assigned to each isolate?
  • Are all genomes assigned to the same ST?

---

EX04: Which antimicrobial resistance genes are present?

Detection of antimicrobial resistance (AMR) genes is an important part of microbial diagnostics.

The tool ABRicate can be used to screen genome assemblies against curated resistance gene databases.

Run ABRicate on the assembled genomes using a resistance gene database.

First, inspect the available options and databases:

abricate.sh -h
abricate.sh --list

Then screen the genomes using the ResFinder database:

abricate.sh --db resfinder /home/projects/microbial_genomics/ex02_assemblies/*.fna

Questions:

  • Which antimicrobial resistance genes are detected in each genome?
  • Are the resistance profiles identical across the isolates?
  • Based on the detected genes, which antibiotic classes might be ineffective?

---

EX05: How are the isolates related?

Whole-genome comparisons are frequently used to assess the relatedness of bacterial isolates, for example during suspected outbreaks.

The tool Parsnp performs core-genome alignment and identifies single nucleotide polymorphisms (SNPs) between closely related genomes.

Use Parsnp to compare the assembled genomes.

First, view the help information:

parsnp.sh -h

Then run Parsnp using one genome as the reference:

parsnp.sh -r /home/projects/microbial_genomics/ex02_assemblies/genome1.fna \
          -d /home/projects/microbial_genomics/ex02_assemblies \
          -o $HOME/parsnp_out

Parsnp produces a core-genome alignment and a phylogenetic tree.

Questions:

  • How many SNPs separate the isolates?
  • Do the genomes cluster closely together?
  • Based on the results, do the isolates appear to be clonally related?

---

Summary

In this exercise you have:

  • Identified the species of bacterial genomes using GTDB-Tk
  • Determined sequence types using MLST
  • Screened for antimicrobial resistance genes using ABRicate
  • Assessed genomic relatedness using Parsnp

Together, these analyses reflect a typical bioinformatics workflow used in microbial diagnostics and epidemiological investigations.