U.S. flag

An official website of the United States government

Variation Services

The task of determining whether two genetic variants are the same poses certain challenges. We must navigate competing text file formats for variant representation, different standards for shifting ambiguous alignments and deal with continually updated reference sequence models on which the variation calls are made. To help address these problems for the genomic community, we are publishing set of versioned web services used to compare and group genetic variants.

With these services, we are able to

  • interconvert between HGVS and VCF formats, with proper left and right-shifting
  • determine, if two variants are the same, using the same standard applied by ClinVar and dbSNP
  • discover where the variant maps to on the current set of RefSeq sequence models,
  • retrieve RefSNP identifier by allele, and retrieve detailed information for that RefSNP as an object

Note: Please use only upper case for nucleotides "NC_000011.9:g.5248232T>A" rather than "NC_000011.9:g.5248232t>a" when submitting variants. This problem will be fixed in a subsequent release.

Variation Services use a common data model described as Sequence Position Deletion Insertion (SPDI).

If you are a geneticist who needs a one-off analysis, try using the HGVS expression of your choice on the demo page.

If you are a dataflow engineer, you can incorporate the services into your own data analysis pipelines, which would provide grouping of variants in the same way NCBI variation resources do it. You can read the API Documentation and the section below for tips on using the API Documentation.

Our initial release (version 0) and every major version thereafter will maintain the backward-compatibility of the object schema. To learn more, read about what these services do and see them applied to a particular example. Or jump to the services themselves .

Remapping Variants

Remapping (or lifting over) is a process for translating sequence coordinates from one sequence to another. Variant remapping is a specific form of remapping that determines how a variant defined relative to one reference sequence corresponds to another variant described on a different, but related reference sequence. Remapping variants is essential to understanding whether two such variants are identical or not. There are two SPDI methods canonical_representative and all_equivalent_contextual that perform this remapping. dbSNP and ClinVar depend on these methods to group simliar submissions into reference variants.

However, note that there are a few caveats in remapping that you should be aware of, when you use these two services.

Tips for using the API Documentation

The API Documentation conforms to the OpenAPI v3 specification.

Services are grouped by the type of input they receive: HGVS, VCF or SPDI. Each group can be collapsed and expanded to see the list of services by clicking on the green heading, respectively. Each group encompasses services that operate on the same input type. For now, all services use the GET protocol, and details of each method can be seen by clicking on any of the text in the blue row.

The method details includes information about:

  • Implementation notes for the method
  • The schema of the response object, whether the response is successful (Status 200) or not
  • The parameters, with the required parameters in bold, and default values present in the text box
  • "Try it out!" button

Using the default parameters, you can click on the "Try it out!" button to see the constructed request URL as well as the response body. You can copy the request URL into a new browser tab and execute it independently, or use the curl example from a *nix command line.

But best of all, you can edit the parameter data, and click on the "Try it out!" button again, and see how the response changes.

Variation Services will not expand IUPAC codes to match all the possible encoded bases (https://www.bioinformatics.org/sms/iupac.html). However, it will accept IUPAC codes if they are in the reference nucleotide or protein sequence to denote an ambiguous sequence due to sequencing artifacts. For instance, the query input nucleotide sequence contains the ambiguous code Y to specify the nucleotide C or T at a specific position. Variation Services only match the reference sequence if Y is also in the same position and will not expand the matching to C or T.

Resources and Tutorials

Reference

SPDI: Data Model for Variants and Applications at NCBI (PubMed).

Support Center

Last updated: 2023-09-06T17:47:39Z