Bacterial Classifications in the Genomic Era

  • Author / Creator
    Liang, Kevin
  • Bacterial taxonomy is an integral part of all disciplines within the field of microbiology, as it allows researchers to communicate results efficiently, streamlining global collaboration. The ultimate goal of bacterial taxonomy is to create groups of organisms based not only on shared phenotypic and genomic traits, but also a common evolutionary history. To achieve this goal, the polyphasic approach, which examines phenotypic, genomic and phylogenetic data, is favored. Although the three major components of polyphasic taxonomy remain unchanged since it was first proposed in 1968, the methods in which we assess these aspects have improved significantly due to the abundance of whole genome sequences (WGS) available. In addition, WGS has also served as the basis for developing high-resolution subspecies level classification techniques. The research presented in this thesis therefore focuses on both applying modern techniques to the polyphasic approach to taxonomy and developing a standardized, easy-to-use high-resolution subspecies typing technique. Traditionally, the 16S rRNA gene has been used to assess genomic and phylogenetic relationships for taxonomic purposes. Although it is now widely known that 16S rDNA is not suitable for species, genus or even family level taxonomic classifications, it is still commonly used to fulfill the phylogenetic aspect of polyphasic taxonomy within the family Rhodobacteraceae. Consequently, taxonomic inconsistencies have been a reoccurring problem since the conception of this group in 2005. To resolve taxonomic inconsistencies within this family, over 300 type strains with high-quality genomes were analyzed. As type strains are important reference material for classification, resolving taxonomic inconsistencies among these strains will ultimately help guide future taxonomic efforts and prevent the propagation of errors. Based on genomic and core-genome phylogenetic data, three species, and 25 genus level misclassifications were identified. Combining a meta-analysis of phenotypes with genomic techniques, distinguishing phenotypic traits useful for family level classification were predicted. Furthermore, a general approach to taxonomy based on genomic and phylogenetic analyses is proposed, to validate taxonomic classifications but also highlight potential misclassifications. Subspecies level classification is an integral part of epidemiological and clinical research, as it is important to differentiate between closely related pathogenic and non-pathogenic strains within the same species. A high-resolution subspecies level typing method, known as core-genome multilocus sequence typing (cgMLST) was developed for Vibrio cholerae, a bacterium best known as the causative agent of cholera. Traditionally, subspecies typing for V. cholerae was based on multilocus sequence typing (MLST), multilocus variable tandem repeats analysis (MLVA) or serotyping. These methods provided limited resolution, which restricted its use in an epidemiological setting. cgMLST, on the other hand, provides much greater resolution than any previously named method as it utilizes a significantly larger portion of the genome by analyzing all genes common to V. cholerae. An outbreak threshold capable of identifying outbreak related strains and potential sources of introduction is proposed. To help consolidate existing MLST information and also investigate large-scale ecological and epidemiological patterns, a sublineage threshold is defined which creates clusters similar to traditional MLST schemes. Using this threshold, a strong geographic signal is detected among environmental isolates not seen in clinical strains. This scheme, along with over 1,200 V. cholerae genomes and relevant provenance data, is currently available on PubMLST ( for public access. Research presented in this thesis demonstrates the importance of WGS-based analyses, not only for taxonomic classifications at the species level and above, but also at the subspecies level. As next generation sequencing and bioinformatics techniques develop, WGS-based methods will inevitably become standard practices for bacterial classification.

  • Subjects / Keywords
  • Graduation date
    Spring 2020
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.