dbVar FAQs

This page will be periodically updated to include frequently asked questions (FAQs).

What is 'dbVar'?
How does dbVar differ from the Database of Genomic Variants (DGV)?
What is ‘structural variation’?
What types of structural variation data does dbVar accept?
Does it matter how I detected the structural variation?
Can I submit structural variation data from any organism?
Can I submit structural variation data from human clinical or cancer studies?
Does dbVar distinguish between pathogenic and benign variants?
Does dbVar accept genotype data?
Will I get a unique dbVar ID to use in publications?
What is the difference between the dbVar accessions?
Why is the data in dbVar different from that provided in the publication?
Why do some dbVar variants have >1 location?
What's the difference between an nsv and an nssv?
Can I download the whole database?
How do I know if a given variant(s) in dbVar is real (or of high quality)?
How do I download all the data from a given region or gene?
Is there a way to integrate dbVar data with dbSNP data?
Can I include any clinical information in my data submission?
My submission contains sensitive private clinical information. How does dbVar guarantee the information will remain private?
What is the smallest size structural variant dbVar accepts?
My study has identified novel insertion sequences and I don't have a genomic coordinate for these. How can I submit these to dbVar?
The number of "Other Calls in this Sample" in the Variant Call Information table of the Variant Page is different from the number of variants that are returned when I do a dbVar search for the same study and sample ID. What's going on?
How does dbVar place data submitted on one assembly (e.g. NCBI36) on other assemblies (e.g. GRCh37)?
How do I submit to dbVar?
How should I cite dbVar?

What is 'dbVar'?

dbVar is the NCBI database of human genomic structural variation. For information on how to navigate dbVar see the dbVar Help page.

How does dbVar differ from the Database of Genomic Variants (DGV)?

DGV has been a useful resource for the human genetics community with respect to collecting and curating structural variation data for human. DGV, dbVar and its European counter-part DGVa, contain data for healthy control human samples, while the latter two also include clinically relevant structural variation data from ClinVar.

What is ‘structural variation’?

Structural variation (SV) is generally defined as any region of DNA involved in inversions and balanced translocations or genomic imbalances (insertions and deletions), commonly referred to as copy number variants (CNVs). For more information see the Overview of Structural Variation page.

What types of structural variation data does dbVar accept?

dbVar is a structural variation database designed to store data on variant DNA ≥ 1 bp in size. Practically speaking, we recommend submitting variation data that is > 50bp to dbVar and variation data that is ≤ 50bp to dbSNP. We can accept diverse types of events, including inversions, insertions and translocations. We discourage the submission of somatic and cancer-related variants, as they tend to be complex and sample-specific, and are more appropriately stored in custom databases.

Does it matter how I detected the structural variation?

dbVar accepts submissions based on analysis of whole-genome or whole-exome next-generation sequencing (NGS) (including paired-end mapping, split-read pair, and read depth) as well as microarray-based experiments including array-CGH and SNP genotyping. A full list of supported methods and analyses can be found on the dbVar Help page.

Can I submit structural variation data from any organism?

Beginning on September 1, 2017 dbVar stopped accepting submissions for any non-human organisms. Non-human SV data can be submitted to DGVa.

Can I submit structural variation data from clinical or cancer studies?

No. All clinically relevant structural variation should be submitted to ClinVar or dbGaP. We discourage the submission of somatic and cancer-related SV to dbVar.

Does dbVar distinguish between pathogenic and benign variants?

Yes, dbVar will clearly mark and allow searching for variants that are known to be pathogenic, providing links to OMIM when available.

Does dbVar accept genotype data?

Yes, dbVar can accept genotype data.

Will I get a unique dbVar ID to use in publications?

Yes, dbVar will provide a unique accession number for each study, each submitted variant region and each supporting level variant.

What is the difference between the dbVar accessions?

dbVar collaborates with DGVa at EBI and with JVar-SV at DDBJ to accession genomic structural variants. Accessions prefixed with 'n' have been processed by NCBI (dbVar, with 'e' by EBI (DGVa), and with 'd' by DDBJ (JVar-SV. NCBI, EBI, and DDBJ provide three levels of accessions:

(n|e|d)std: the study id - this identifies a submitted study

(n|e|d)sv: the structural variant id - this identifies the submitted region of variation

(n|e|d)ssv: the supporting structural variant id - this identifies the supporting regions of variation (often sample-specific) that were used to call the submitted region of variation

Why is the data in dbVar sometimes different from that provided in the publication?

The loading of studies that are submitted to dbVar after publication may highlight errors during our quality control checks. In these cases submitters will be contacted and the errors corrected. In other cases a submitter may detect errors before submitting or may decide to edit their data. Often this will be documented in the study record.

Why do some dbVar variants have more than one location?

There are two reasons:

When a submitter gives us data on an assembly obtained from UCSC, we translate this into native assembly coordinates and map sequences (chromosomes and unplaced/unlocalized sequences) to their accession.versions. UCSC concatenates unplaced/unlocalized sequences into pseudo-scaffold objects they call chr_random. In some cases, submitters have provided data that cross gaps on the 'chr_random' sequences, meaning that the feature actually maps to two different, unrelated sequences.

More than one location may also be provided if the variant is the result of a transposition event. In this case, coordinates from both the donor and recipient sites are provided, to retain as much information about the variant event as possible.

What's the difference between an nsv and an nssv?

nsv and nssv are accession prefixes for variant regions and variant calls (or instances), respectively. Typically, one or more variant instances (nssv – variant calls based directly on experimental evidence) are merged into a variant region (nsv – a pair of start-stop coordinates reflecting the submitters’ assertion of the region of the genome that is affected by the variant instances). The ‘n’ preceding sv or ssv indicates that the variants were submitted to NCBI (dbVar). esv and essv represent the same variant entities, but those that were submitted to EBI (DGVa); similarly, dsv and dssv were submitted to DDBJ (JVar-SV).

Please see Overview of Structural Variation for more information.

Can I download the whole database?

All data is available on our FTP site. Data are available on a per study and per assembly basis.

How do I know if a given variant(s) in dbVar is real / of high quality?

dbVar is an archive. We report variants as they are submitted to us, usually in association with a peer-reviewed publication. Responsibility for data reproducibility lay with the submitting investigator. dbVar encourages, but does not require, the validation of all variants by at least one alternative method, and we provide a mechanism to include validation results in the submission template. Any submitted validation data are presented as an integral part of the study data. To be listed as “validated” a variant must have been confirmed with at least one, or possibly more, additional independent methods. If, as a consumer of dbVar data, have concerns regarding a particular variant or data set, we recommend you contact the relevant submitter for more supporting information.

How do I find data from a given region or gene?

You can perform searches using gene names. The results that are returned will include all variants that overlap the gene, and the studies with which the variants are associated.

Is there a way to integrate dbVar data with dbSNP data?

Yes, Variation Viewer provides some functionality in this regard. Alternatively, the user can integrate the data manually with avilable bioinformatics tools.

Can I include any clinical information in my data submission?

No, all submissions with clinically relevant structural variation should be submitted to ClinVar.

My submission contains sensitive private clinical information. How does dbVar guarantee the information will remain private?

Submisions with sensitive patient information should be submitted to dbGaP.

What is the smallest size structural variant dbVar accepts?

There are no size restrictions on structural variation data. We recommend that variants smaller than 50 bp be submitted to dbSNP but we will accept variants as small as a single basepair as long as the variant is an insertion or deletion, not a single nucleotide change.

My study has identified novel insertion sequences and I don't have a genomic coordinate for these. How can I submit these to dbVar?

If you have novel insertion sequence data, please submit it first as a WGS Project. This will give all of your novel insertion sequences unique identifiers that can then be tracked. You can then submit your data to dbVar and these novel insertion sequences can reference the sequence identifiers obtained from the WGS submission. With stable sequence identifiers, we may be able to map the sequence to updated assemblies and obtain a chromosome context for this sequence.

The number of "Other Calls in this Sample" in the Variant Call Information table of the Variant Page is different from the number of variants that are returned when I do a dbVar search for the same study and sample ID. What's going on?

The number in the table on the Variant Page indicates the number of Variant Calls (SSVs) in the sample. A search for the study and sample ID returns the number of Variant Regions (SVs) after similar calls have been merged into regions.

How does dbVar place data submitted on one assembly (e.g., GRCh37) on other assemblies (e.g., GRCh38)?

dbVar uses in-house remapping software to map variants between assemblies. All variants reported in human-based studies are automatically remapped to both GRCh37 and GRCh38.

How do I submit to dbVar?

We encourage you to submit data to dbVar by completing one of the templates we provide (Excel, Tab-delimited, XML) or by VCF, and emailing it to dbvar@ncbi.nlm.nih.gov. Please see dbVar Submission Information for more information.

How should I cite dbVar?

Please reference the following publication when citing dbVar:

Lappalainen I, Lopez J, Skipper L, Hefferon T, Spalding JD, Garner J, Chen C, Maguire M, Corbett M, Zhou G, Paschall J, Ananiev V, Flicek P, Church DM. DbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Res. 2013 Jan;41(Database issue):D936-41. doi: 10.1093/nar/gks1213. Epub 2012 Nov 27. PMID: 23193291; PMCID: PMC3531204.

If you wish to reference a specific submission, cite the study accession, e.g., nstd166; if you wish to reference a specific variant, cite the variant region or variant call accession, e.g., nsv4136077 or nssv15910013.