Microbial genomics exercise

From 22126
Revision as of 14:35, 6 January 2026 by Rasmus (talk | contribs)
Jump to navigation Jump to search


Dear course participants,

In this exercise you will analyse microbial genome sequences using bioinformatics tools that are commonly used for microbial diagnostics and research.

The tools are available at the server as Apptainer container images. Although the image files are located in /home/projects/microbial_genomics/singularity_image_files, you do not need to call them directly. Instead, you can run the tools using the provided BASH executables in /home/ctools/bin, which is already available via your $PATH.

gtdbtk.sh
mlst.sh
parsnp.sh
abricate.sh

You can use


Background

We imagine that we are employed at a hospital to provide diagnostics for patient care.

The laboratory has sequenced genomic DNA from a clinical specimen. The sequence reads are stored in FASTQ files that are compressed with gzip:

  • X
  • Y

Use the following command to read the first lines of one of the files and inspect its content:

zcat filename.fastq.gz | head

EX02: What species is the genome ?

The laboratory has sequenced genomic DNA from single-colony isolates of bacteria cultivated from a clinical specimen. The sequence reads have been de novo assembled, and the genome assemblies are stored in FASTA-formatted files available in /home/projects/microbial_genomics/genome_assemblies.

Your want to determine the bacterial species of the assembled genomes.

We can use the GTDB-Tk tool to assign taxonomic classifications to bacterial genomes based on the Genome Database Taxonomy (GTDB).

Run gtdbtk.sh -h to get help information on how to use GTDB-Tk.

Use GTDB-Tk to determine the species of the genomes:

gtdbtk.sh classify_wf --extension .fna --cpus 10 --genome_dir /home/projects/microbial_genomics/ex02_assemblies --out_dir $HOME/output

Question: What species are the genomes ?


EX03: What sequence type is the genome?

In addition to species identification, sequence typing is commonly used in clinical microbiology to compare isolates and support outbreak investigations.

Multilocus Sequence Typing (MLST) assigns isolates to a sequence type (ST) based on the allelic profiles of a defined set of housekeeping genes.

The assembled genomes are available in:

/home/projects/microbial_genomics/ex02_assemblies

Run the MLST tool to determine the sequence type of each genome.

Start by inspecting the available options:

mlst.sh -h

Then run MLST on the genome assemblies:

mlst.sh /home/projects/microbial_genomics/ex02_assemblies/*.fna

Questions:

  • What MLST scheme is used for each genome?
  • What sequence type (ST) is assigned to each isolate?
  • Are all genomes assigned to the same ST?

---

EX04: Which antimicrobial resistance genes are present?

Detection of antimicrobial resistance (AMR) genes is an important part of microbial diagnostics.

The tool ABRicate can be used to screen genome assemblies against curated resistance gene databases.

Run ABRicate on the assembled genomes using a resistance gene database.

First, inspect the available options and databases:

abricate.sh -h
abricate.sh --list

Then screen the genomes using the ResFinder database:

abricate.sh --db resfinder /home/projects/microbial_genomics/ex02_assemblies/*.fna

Questions:

  • Which antimicrobial resistance genes are detected in each genome?
  • Are the resistance profiles identical across the isolates?
  • Based on the detected genes, which antibiotic classes might be ineffective?

---

EX05: How are the isolates related?

Whole-genome comparisons are frequently used to assess the relatedness of bacterial isolates, for example during suspected outbreaks.

The tool Parsnp performs core-genome alignment and identifies single nucleotide polymorphisms (SNPs) between closely related genomes.

Use Parsnp to compare the assembled genomes.

First, view the help information:

parsnp.sh -h

Then run Parsnp using one genome as the reference:

parsnp.sh -r /home/projects/microbial_genomics/ex02_assemblies/genome1.fna \
          -d /home/projects/microbial_genomics/ex02_assemblies \
          -o $HOME/parsnp_out

Parsnp produces a core-genome alignment and a phylogenetic tree.

Questions:

  • How many SNPs separate the isolates?
  • Do the genomes cluster closely together?
  • Based on the results, do the isolates appear to be clonally related?

---

Summary

In this exercise you have:

  • Identified the species of bacterial genomes using GTDB-Tk
  • Determined sequence types using MLST
  • Screened for antimicrobial resistance genes using ABRicate
  • Assessed genomic relatedness using Parsnp

Together, these analyses reflect a typical bioinformatics workflow used in microbial diagnostics and epidemiological investigations.




Supplementary files

Article describing GTDB-Tk can be found here