|
Status |
Public on Aug 31, 2018 |
Title |
Comparison of variant calling pipelines using Illumina CanineHD BeadChip array as the truth dataset |
Organism |
Canis lupus familiaris |
Experiment type |
Genome variation profiling by SNP array SNP genotyping by SNP array
|
Summary |
Next generation sequencing platforms have become essential tools for understanding DNA in a wide range of contexts. Their success heavily relies on the accuracy, sensitivity and specificity of methods used to discern differences between the reference genome and genomes under investigation. Here we compare the relative performances of five popular single nucleotide variant callers with and without their associated recommended hard filtering criteria. We compare: FreeBayes; the Genome Analysis Toolkit’s Haplotype Caller and Unified Genotyper; SAMtools; and VarScan. We tailor this comparison to suit smaller projects with modest sample numbers (n = 10) and coverage (~10X) to fill a current gap in the literature. Other comparison studies are generally applicable only to larger projects in model species, where there is access to large amounts of sequencing data and curated callsets for base and variant quality score recalibration. We estimated the accuracy, sensitivity and specificity of each pipeline according to the genotype concordance rate and number with the “truth” dataset for 10 canine samples. The truth dataset was defined as genotypes obtained from the CanineHD BeadChip array. Whole genome sequencing data was performed on the Illumina HiSeq2000 or HiSeq2500 platform as 100-101 base pair, paired end reads to an average sample coverage of 10.3X. Apart from GATK Haplotype Caller, applying recommended hard filters did not improve the performance of genotyping concordance at the tested levels of minimum coverage. The default VarScan pipeline with no additional filters applied (VarScan uses SAMtools mpileup, without base alignment quality computation) generally outperformed other callers in terms of accuracy, sensitivity and specificity. The results of this study demonstrate that hard filtering of variant calls from low-powered genome studies can impair accuracy, sensitivity and specificity of callsets and provides some benchmark performance metrics on a range of low coverage levels.
|
|
|
Overall design |
We whole genome sequenced and genotyped using the Illumina CanineHD BeadChip array 12 individual samples. After quality control, we used Illumina CanineHD BeadChip array genotypes as the truth dataset and measured concordance rates between variant calling pipelines to the truth dataset. This series contains only the 'truth' BeadChip dataset. The whole genome sequence project is available in BioProject: PRJNA477886.
|
|
|
Contributor(s) |
Wade CM, Chew T, Haase B, Willet CE |
Citation missing |
Has this study been published? Please login to update or notify GEO. |
|
Submission date |
Jul 12, 2018 |
Last update date |
Sep 01, 2018 |
Contact name |
Tracy Chew |
Organization name |
University of Sydney
|
Department |
Sydney Informatics Hub
|
Street address |
Merewether Building (H04) City Rd & Butlin Ave
|
City |
Darlington |
State/province |
NSW |
ZIP/Postal code |
2008 |
Country |
Australia |
|
|
Platforms (1) |
|
Samples (7)
|
|
Relations |
BioProject |
PRJNA480852 |