We performed whole-genome sequencing of the five following Mycobacterium microti strains: (1) Mycobacterium microti OV254, which was originally isolated from voles in the UK in the 1930's and was kindly supplied by M. J. Colston; (2) Mycobacterium microti ATCC 35782 (also designated as TMC 1608 [M.P. Prague]), which was purchased from the American Type Culture Collection; (3) Mycobacterium microti 94-2272, which was isolated in 1988 from the perfusion fluid of a 41-year-old dialysis patient and was kindly provided by L. M. Parsons; (4) Mycobacterium microti Maus III (also known as strain 6740/00), which was obtained from the collection of the National Reference Center for Mycobacteria, Forschungszentrum Borstel, Germany; and (5) Mycobacterium microti Maus IV (also known as strain 1479/99), which was obtained from the collection of the National Reference Center for Mycobacteria, Forschungszentrum Borstel, Germany. On one hand, genomic DNA of M. microti strain were prepared using the SMRTbell Template library preparation kit and sequenced at the Biomics platform of the Institut Pasteur (Paris, France) using the PacBio SMRT long-read sequencing technology (Chemistry v2.1; Sequel ICS v5.0.1; SMRT Link v5.0.1). On the other hand, genomic DNA of M. microti strains were sequenced using the Illumina paired-end short-read sequencing technology. Strains OV254, ATCC 35782, 94-2272 and Maus IV were sequenced at the Wellcome Trust Sanger Institute (Hinxton, UK) using a Genome Analyzer IIx device, while strain Maus III was sequenced at the Biomics platform of the Institut Pasteur (Paris, France) using a HiSeq 2500 device. First, Raw PacBio subreads were assembled using Canu and generated supercontigs were then organized as compared to the genome of M. bovis BCG Pasteur 1173P2 using MeDuSa. Second, Illumina read pairs were trimmed using Trimmomatic. Complete read pairs were then assembled using SPAdes and providing either raw PacBio subreads or the scaffolds of PacBio supercontigs. Illumina contigs thus generated were then compared to the PacBio scaffolds using BLASTN to correct erroneous arrangements and to fill undefined gaps. Third, Illumina read pairs were mapped against the resulting assembled genome using BWA-MEM. Variants were then called using VarScan to detect and correct sequencing errors within whole-genome assemblies.
Less...