NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Series GSE117010 Query DataSets for GSE117010
Status Public on Aug 31, 2018
Title Comparison of variant calling pipelines using Illumina CanineHD BeadChip array as the truth dataset
Organism Canis lupus familiaris
Experiment type Genome variation profiling by SNP array
SNP genotyping by SNP array
Summary Next generation sequencing platforms have become essential tools for understanding DNA in a wide range of contexts. Their success heavily relies on the accuracy, sensitivity and specificity of methods used to discern differences between the reference genome and genomes under investigation. Here we compare the relative performances of five popular single nucleotide variant callers with and without their associated recommended hard filtering criteria. We compare: FreeBayes; the Genome Analysis Toolkit’s Haplotype Caller and Unified Genotyper; SAMtools; and VarScan. We tailor this comparison to suit smaller projects with modest sample numbers (n = 10) and coverage (~10X) to fill a current gap in the literature. Other comparison studies are generally applicable only to larger projects in model species, where there is access to large amounts of sequencing data and curated callsets for base and variant quality score recalibration. We estimated the accuracy, sensitivity and specificity of each pipeline according to the genotype concordance rate and number with the “truth” dataset for 10 canine samples. The truth dataset was defined as genotypes obtained from the CanineHD BeadChip array. Whole genome sequencing data was performed on the Illumina HiSeq2000 or HiSeq2500 platform as 100-101 base pair, paired end reads to an average sample coverage of 10.3X. Apart from GATK Haplotype Caller, applying recommended hard filters did not improve the performance of genotyping concordance at the tested levels of minimum coverage. The default VarScan pipeline with no additional filters applied (VarScan uses SAMtools mpileup, without base alignment quality computation) generally outperformed other callers in terms of accuracy, sensitivity and specificity. The results of this study demonstrate that hard filtering of variant calls from low-powered genome studies can impair accuracy, sensitivity and specificity of callsets and provides some benchmark performance metrics on a range of low coverage levels.
 
Overall design We whole genome sequenced and genotyped using the Illumina CanineHD BeadChip array 12 individual samples. After quality control, we used Illumina CanineHD BeadChip array genotypes as the truth dataset and measured concordance rates between variant calling pipelines to the truth dataset. This series contains only the 'truth' BeadChip dataset. The whole genome sequence project is available in BioProject: PRJNA477886.
 
Contributor(s) Wade CM, Chew T, Haase B, Willet CE
Citation missing Has this study been published? Please login to update or notify GEO.
Submission date Jul 12, 2018
Last update date Sep 01, 2018
Contact name Tracy Chew
Organization name University of Sydney
Department Sydney Informatics Hub
Street address Merewether Building (H04) City Rd & Butlin Ave
City Darlington
State/province NSW
ZIP/Postal code 2008
Country Australia
 
Platforms (1)
GPL20953 CanineHD BeadChip
Samples (7)
GSM3267316 USCF1292
GSM3267317 USCF1293
GSM3267318 USCF1294
Relations
BioProject PRJNA480852

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE117010_RAW.tar 17.4 Mb (http)(custom) TAR
GSE117010_USCF_30JUN18_FinalReport.txt.gz 13.4 Mb (ftp)(http) TXT
GSE117010_processed_data.txt.gz 741.3 Kb (ftp)(http) TXT
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap