Comprehensive bioinformatics analysis of Mycoplasma pneumoniae genomes to investigate underlying population structure and type-specific determinants

PLoS One. 2017 Apr 14;12(4):e0174701. doi: 10.1371/journal.pone.0174701. eCollection 2017.

Abstract

Mycoplasma pneumoniae is a significant cause of respiratory illness worldwide. Despite a minimal and highly conserved genome, genetic diversity within the species may impact disease. We performed whole genome sequencing (WGS) analysis of 107 M. pneumoniae isolates, including 67 newly sequenced using the Pacific BioSciences RS II and/or Illumina MiSeq sequencing platforms. Comparative genomic analysis of 107 genomes revealed >3,000 single nucleotide polymorphisms (SNPs) in total, including 520 type-specific SNPs. Population structure analysis supported the existence of six distinct subgroups, three within each type. We developed a predictive model to classify an isolate based on whole genome SNPs called against the reference genome into the identified subtypes, obviating the need for genome assembly. This study is the most comprehensive WGS analysis for M. pneumoniae to date, underscoring the power of combining complementary sequencing technologies to overcome difficult-to-sequence regions and highlighting potential differential genomic signatures in M. pneumoniae.

MeSH terms

  • Bacterial Typing Techniques
  • Bayes Theorem
  • Cluster Analysis
  • Computational Biology*
  • Genetic Variation
  • Genome, Bacterial*
  • High-Throughput Nucleotide Sequencing
  • Mycoplasma pneumoniae / classification
  • Mycoplasma pneumoniae / genetics*
  • Phylogeny
  • Polymorphism, Single Nucleotide
  • Sequence Analysis, DNA