Microbial genomics exercise: Difference between revisions

From 22126
Jump to navigation Jump to search
No edit summary
(Undo revision 355 by Rasmus (talk))
Tag: Undo
Line 1: Line 1:
== EX03: What sequence type is the genome? ==


In addition to species identification, sequence typing is commonly used in clinical microbiology to compare isolates and support outbreak investigations.


'''Multilocus Sequence Typing (MLST)''' assigns isolates to a sequence type (ST) based on the allelic profiles of a defined set of housekeeping genes.
Dear course participants,


The assembled genomes are available in:
In this exercise you will analyse microbial genome sequences using bioinformatics tools that are commonly used for microbial diagnostics and research.
 
<code>/home/projects/microbial_genomics/ex02_assemblies</code>
 
Run the MLST tool to determine the sequence type of each genome.
 
Start by inspecting the available options:
 
<pre>
mlst.sh -h
</pre>
 
Then run MLST on the genome assemblies:


The tools are available at the server as Apptainer container images.
Although the image files are located in <code>/home/projects/microbial_genomics/singularity_image_files</code>, you do not need to call them directly. Instead, you can run the tools using the provided BASH executables in <code>/home/ctools/bin</code>, which is already available via your <code>$PATH</code>.
<pre>
<pre>
mlst.sh /home/projects/microbial_genomics/ex02_assemblies/*.fna
gtdbtk.sh
mlst.sh
parsnp.sh
abricate.sh
</pre>
</pre>


'''Questions:'''
You can use
* What MLST scheme is used for each genome?
* What sequence type (ST) is assigned to each isolate?
* Are all genomes assigned to the same ST?
 
---
 
== EX04: Which antimicrobial resistance genes are present? ==


Detection of antimicrobial resistance (AMR) genes is an important part of microbial diagnostics.


The tool '''ABRicate''' can be used to screen genome assemblies against curated resistance gene databases.
== Background ==


Run ABRicate on the assembled genomes using a resistance gene database.
We imagine that we are employed at a hospital to provide diagnostics for patient care.


First, inspect the available options and databases:
The laboratory has sequenced genomic DNA from a clinical specimen. The sequence reads are stored in FASTQ files that are compressed with gzip:


<pre>
* X
abricate.sh -h
* Y
abricate.sh --list
</pre>


Then screen the genomes using the ResFinder database:
Use the following command to read the first lines of one of the files and inspect its content:


<pre>
<pre>
abricate.sh --db resfinder /home/projects/microbial_genomics/ex02_assemblies/*.fna
zcat filename.fastq.gz | head
</pre>
</pre>


'''Questions:'''
== EX02: What species is the genome ? ==
* Which antimicrobial resistance genes are detected in each genome?
* Are the resistance profiles identical across the isolates?
* Based on the detected genes, which antibiotic classes might be ineffective?
 
---


== EX05: How are the isolates related? ==
The laboratory has sequenced genomic DNA from single-colony isolates of bacteria cultivated from a clinical specimen. The sequence reads have been ''de novo'' assembled, and the genome assemblies are stored in FASTA-formatted files available in <code>/home/projects/microbial_genomics/genome_assemblies</code>.


Whole-genome comparisons are frequently used to assess the relatedness of bacterial isolates, for example during suspected outbreaks.
Your want to determine the bacterial species of the assembled genomes.


The tool '''Parsnp''' performs core-genome alignment and identifies single nucleotide polymorphisms (SNPs) between closely related genomes.
We can use the [https://github.com/Ecogenomics/GTDBTk GTDB-Tk tool] to assign taxonomic classifications to bacterial genomes based on the [https://gtdb.ecogenomic.org Genome Database Taxonomy (GTDB)].


Use Parsnp to compare the assembled genomes.
Run <code>gtdbtk.sh -h</code> to get help information on how to use GTDB-Tk.


First, view the help information:
Use GTDB-Tk to determine the species of the genomes:


<pre>
<pre>
parsnp.sh -h
gtdbtk.sh classify_wf --extension .fna --cpus 10 --genome_dir /home/projects/microbial_genomics/ex02_assemblies --out_dir $HOME/output
</pre>
</pre>


Then run Parsnp using one genome as the reference:
'''Question: What species are the genomes ?'''
 
 
== EX03: What sequence type is the genome ? ==


<pre>
parsnp.sh -r /home/projects/microbial_genomics/ex02_assemblies/genome1.fna \
          -d /home/projects/microbial_genomics/ex02_assemblies \
          -o $HOME/parsnp_out
</pre>


Parsnp produces a core-genome alignment and a phylogenetic tree.


'''Questions:'''
* How many SNPs separate the isolates?
* Do the genomes cluster closely together?
* Based on the results, do the isolates appear to be clonally related?


---


== Summary ==


In this exercise you have:


* Identified the species of bacterial genomes using '''GTDB-Tk'''
== Supplementary files ==
* Determined sequence types using '''MLST'''
* Screened for antimicrobial resistance genes using '''ABRicate'''
* Assessed genomic relatedness using '''Parsnp'''


Together, these analyses reflect a typical bioinformatics workflow used in microbial diagnostics and epidemiological investigations.
[https://teaching.healthtech.dtu.dk/22126/images/c/cd/Gtdbtk.pdf Article describing GTDB-Tk can be found here]

Revision as of 14:34, 6 January 2026


Dear course participants,

In this exercise you will analyse microbial genome sequences using bioinformatics tools that are commonly used for microbial diagnostics and research.

The tools are available at the server as Apptainer container images. Although the image files are located in /home/projects/microbial_genomics/singularity_image_files, you do not need to call them directly. Instead, you can run the tools using the provided BASH executables in /home/ctools/bin, which is already available via your $PATH.

gtdbtk.sh
mlst.sh
parsnp.sh
abricate.sh

You can use


Background

We imagine that we are employed at a hospital to provide diagnostics for patient care.

The laboratory has sequenced genomic DNA from a clinical specimen. The sequence reads are stored in FASTQ files that are compressed with gzip:

  • X
  • Y

Use the following command to read the first lines of one of the files and inspect its content:

zcat filename.fastq.gz | head

EX02: What species is the genome ?

The laboratory has sequenced genomic DNA from single-colony isolates of bacteria cultivated from a clinical specimen. The sequence reads have been de novo assembled, and the genome assemblies are stored in FASTA-formatted files available in /home/projects/microbial_genomics/genome_assemblies.

Your want to determine the bacterial species of the assembled genomes.

We can use the GTDB-Tk tool to assign taxonomic classifications to bacterial genomes based on the Genome Database Taxonomy (GTDB).

Run gtdbtk.sh -h to get help information on how to use GTDB-Tk.

Use GTDB-Tk to determine the species of the genomes:

gtdbtk.sh classify_wf --extension .fna --cpus 10 --genome_dir /home/projects/microbial_genomics/ex02_assemblies --out_dir $HOME/output

Question: What species are the genomes ?


EX03: What sequence type is the genome ?

Supplementary files

Article describing GTDB-Tk can be found here