Evidence Qualifiers
The evidence qualifiers /experiment and /inference can be used to provide more detail about the support for the annotation. These qualifiers replaced "evidence=experimental" and "evidence=non-experimental", respectively, which are no longer supported.
/experiment
Definition: a brief description of the nature of the experimental evidence that supports the feature identification or assignment.
Value format: "[CATEGORY:]text"
CATEGORY is optional, and is one of these:
- COORDINATES; support for the annotated coordinates
- DESCRIPTION; support for a broad concept of function such as that based on phenotype, genetic approach, biochemical function, pathway information, etc.
- EXISTENCE; support for the known or inferred existence of the product
A PubMedID or doi can be included within brackets in the text. Examples: /experiment="EXISTENCE:Northern blot"
/experiment="heterologous expression system of Xenopus laevis oocytes [PMID: 12345678, 10101010, 987654]"
Comment: detailed experimental details should not be included, and would normally be found in the cited publications.
/inference
Definition: a structured description of non-experimental evidence that supports the feature identification or assignment.
The /inference qualifier provides a structured description of non-experimental evidence that supports feature identification or assignment. It allows data provides to point by name to data resources and tools that were implicated in the identification of the parent feature.
Value format: ["CATEGORY:]TYPE[ (same species)][:EVIDENCE_BASIS]"
CATEGORY is optional, and is one of these:
- COORDINATES; support for the annotated coordinates
- DESCRIPTION; support for a broad concept of function such as that based on phenotype, genetic approach, biochemical function, pathway information, etc.
- EXISTENCE; support for the known or inferred existence of the product
TYPE is required and is one of these:
- /inference="similar to AA sequence"
- /inference="similar to DNA sequence"
- /inference="similar to RNA sequence"
- /inference="similar to RNA sequence, mRNA"
- /inference="similar to RNA sequence, EST"
- /inference="similar to RNA sequence, other RNA"
- /inference="profile"
- /inference="nucleotide motif"
- /inference="protein motif"
- /inference="ab initio prediction"
- /inference="alignment"
The optional text "(same species)" can be included when the inference comes from the same species as the entry.
EVIDENCE_BASIS provides reference to a database entry (including accession and version) or an algorithm (including version). The accession.version number of a database record and the version number of an algorithm are separated from the database or algorithm name by a colon, as seen in the examples.
Examples:
- /inference="similar to DNA sequence:INSD:AY411252.1"
- /inference="similar to RNA sequence, mRNA:RefSeq:NM_000041.2"
- /inference="similar to DNA sequence (same species):INSD:AACN010222672.1"
- /inference="profile:tRNAscan:2.1"
- /inference="protein motif:InterPro:IPR001900"
- /inference="ab initio prediction:Genscan:2.0"
- /inference="alignment:Splign:1.0"
- /inference="alignment:Splign:1.26p:RefSeq:NM_000041.2,INSD:BC003557.1"
Several things to note about /inference are:
- When citing a GenBank record, use INSD (International Sequence Database).
- When citing a RefSeq record (recognized by the underscore between the letters and the digits), use RefSeq.
- Include the version of the algorithm that was used, and separate the version from the algorithm name with a colon, eg Genscan:2.0.
- The EVIDENCE_BASIS can include the accession.version of the records that support the analysis. The format is [ALGORITHM][:EVIDENCE_DBREF[,EVIDENCE_DBREF]*[,...]], as in the example "alignment:Splign:1.26p:RefSeq:NM_000041.2,INSD:BC003557.1"
- Leading and trailing spaces should not be included in resource names
- The following table presents recommended acronyms for commonly cited resources
Name of data resource/tool | Recommended acronym |
---|---|
International Nucleotide Sequence Database | INSD |
NCBI Reference Sequence Database | RefSeq |
UniProt Knowledgebase | UniProtKB |
The database of Clusters of Orthologous Groups of proteins | COGs |
The Protein Family Database | PFAM |
NCBI Conserved Domain Database | CDD |
The InterPro Database of Protein Families, Domains and Functional Sites | InterPro |
CATH domain structure database | CATH |
Evidence Code Ontology | ECO |
Digital Object Identifier (citations) | DOI |
PubMed Identifier (citations) | PMID |
Example .tbl file:
In this example the first CDS is predicted by Genscan 2.0, the second CDS was identified by its similarity to EST H22345.1 from the same species, the third CDS was identified because it's similar to GenBank (INSD) record JQ340893.1 and by its InterPro domain IPR001900, and the fourth CDS has experimental expression evidence.
>Feature ExampleSeq
1 100 gene
locus_tag Test_0001
1 100 CDS
product hypothetical protein
protein_id gnl|center_name|Test_0001
inference ab initio prediction:Genscan:2.0
200 300 gene
locus_tag Test_0002
200 300 CDS
product putative helicase
protein_id gnl|center_name|Test_0002
inference similar to RNA sequence, EST (same species):INSD:H22345.1
400 500 gene
locus_tag Test_0003
400 500 CDS
product ribonuclease
protein_id gnl|center_name|Test_0003
inference similar to RNA sequence, mRNA:INSD:JQ340893.1
inference protein motif:InterPro:IPR001900
600 700 gene
locus_tag Test_0004
600 700 CDS
product alcohol dehydrogenase
protein_id gnl|center_name|Test_0004
experiment EXISTENCE:expression of GST fusion protein
The resulting flatfile looks like this:
gene 1..100
/locus_tag="Test_0001"
CDS 1..100
/locus_tag="Test_0001"
/inference="ab initio prediction:Genscan:2.0"
/codon_start=1
/product="hypothetical protein"
/translation="M...."
gene 200..300
/locus_tag="Test_0002"
CDS 200..300
/locus_tag="Test_0002"
/inference="similar to RNA sequence, EST (same
species):INSD:H223456.1"
/codon_start=1
/product="putative helicase"
/translation="M...."
gene 400..500
/locus_tag="Test_0003"
CDS 400..500
/locus_tag="Test_0003"
/inference="protein motif:InterPro:IPR001900"
/inference="similar to RNA sequence, mRNA:INSD:JQ340893.1"
/codon_start=1
/product="ribonuclease"
/translation="M...."
gene 600..700
/locus_tag="Test_0004"
CDS 600..700
/locus_tag="Test_0004"
/experiment="EXISTENCE:expression of GST fusion protein"
/codon_start=1
/product="alcohol dehydrogenase"
/translation="M...."
Genome Resources
- About WGS
- WGS Browser
- Genome Submission Guide
- Genome Submission Portal
- Update Genome Records
- FAQ
- table2asn
- Submitting Multiple Haplotype Assemblies
- Create Submission Template
- Eukaryotic Annotation Guide
- Prokaryotic Annotation Guide
- Annotation Example Files
- Annotating Genomes with GFF3 or GTF files
- Validation Error Explanations for Genomes
- Discrepancy Report
- NCBI Prokaryotic Genome Annotation Pipeline
- AGP Format
- Metagenome Submission Guide
- Structured Comment
- BioProject
- BioSample