GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Series GSE117010

Query DataSets for GSE117010

Status

Public on Aug 31, 2018

Title

Comparison of variant calling pipelines using Illumina CanineHD BeadChip array as the truth dataset

Organism

Canis lupus familiaris

Experiment type

Genome variation profiling by SNP array
SNP genotyping by SNP array

Summary

Next generation sequencing platforms have become essential tools for understanding DNA in a wide range of contexts. Their success heavily relies on the accuracy, sensitivity and specificity of methods used to discern differences between the reference genome and genomes under investigation. Here we compare the relative performances of five popular single nucleotide variant callers with and without their associated recommended hard filtering criteria. We compare: FreeBayes; the Genome Analysis Toolkit’s Haplotype Caller and Unified Genotyper; SAMtools; and VarScan. We tailor this comparison to suit smaller projects with modest sample numbers (n = 10) and coverage (~10X) to fill a current gap in the literature. Other comparison studies are generally applicable only to larger projects in model species, where there is access to large amounts of sequencing data and curated callsets for base and variant quality score recalibration. We estimated the accuracy, sensitivity and specificity of each pipeline according to the genotype concordance rate and number with the “truth” dataset for 10 canine samples. The truth dataset was defined as genotypes obtained from the CanineHD BeadChip array. Whole genome sequencing data was performed on the Illumina HiSeq2000 or HiSeq2500 platform as 100-101 base pair, paired end reads to an average sample coverage of 10.3X. Apart from GATK Haplotype Caller, applying recommended hard filters did not improve the performance of genotyping concordance at the tested levels of minimum coverage. The default VarScan pipeline with no additional filters applied (VarScan uses SAMtools mpileup, without base alignment quality computation) generally outperformed other callers in terms of accuracy, sensitivity and specificity. The results of this study demonstrate that hard filtering of variant calls from low-powered genome studies can impair accuracy, sensitivity and specificity of callsets and provides some benchmark performance metrics on a range of low coverage levels.

Overall design

We whole genome sequenced and genotyped using the Illumina CanineHD BeadChip array 12 individual samples. After quality control, we used Illumina CanineHD BeadChip array genotypes as the truth dataset and measured concordance rates between variant calling pipelines to the truth dataset. This series contains only the 'truth' BeadChip dataset. The whole genome sequence project is available in BioProject: PRJNA477886.

Contributor(s)

Wade CM, Chew T, Haase B, Willet CE

Citation missing

Has this study been published? Please login to update or notify GEO.

Submission date

Jul 12, 2018

Last update date

Sep 01, 2018

Contact name

Tracy Chew

Organization name

University of Sydney

Department

Sydney Informatics Hub

Street address

Merewether Building (H04) City Rd & Butlin Ave

City

Darlington

State/province

NSW

ZIP/Postal code

2008

Country

Australia

Platforms (1)

GPL20953

CanineHD BeadChip

Samples (7)

More...

GSM3267316	USCF1292
GSM3267317	USCF1293
GSM3267318	USCF1294

Relations

BioProject

PRJNA480852

Download family	Format
SOFT formatted family file(s)	SOFT
MINiML formatted family file(s)	MINiML
Series Matrix File(s)	TXT

Supplementary file	Size	Download	File type/resource
GSE117010_RAW.tar	17.4 Mb	(http)(custom)	TAR
GSE117010_USCF_30JUN18_FinalReport.txt.gz	13.4 Mb	(ftp)(http)	TXT
GSE117010_processed_data.txt.gz	741.3 Kb	(ftp)(http)	TXT
Processed data are available on Series record