Genomes Selected for RefSeq Annotation

Genomes Selected for RefSeq Annotation

Some eukaryotic genome assemblies are annotated using the NCBI Eukaryotic Genome Annotation Pipeline (EGAP) and are included in RefSeq. They are chosen using the following criteria:

  • Taxonomic scope:

    • In scope: Vertebrates, higher plants, arthropods, and some other invertebrates.

    • Out-of-scope: Fungi, nematodes, and protozoans.

  • Assembly quality:

    • Contiguity: Genomes assembled to the level of chromosomes, and genomes with high contig and scaffold N50 values are preferred. Assemblies with contig N50 below 50 Kb are excluded.

    • Sequence accuracy: Genome assemblies with high counts of indels or base substitutions are excluded if these substantially affect EGAP’s ability to produce a quality annotation.

  • Availability of transcriptomics data in SRA: EGAP is highly dependent on experimental evidence, so only genomes for species with RNA-Seq data public in SRA are annotated.

  • Best for the species: Only one genome per species is in RefSeq at any given time. The best-quality genome (see “Assembly quality” above) or a genome of high value to the scientific community is chosen.

  • Community interest/requests: We prioritize the annotation of genomes for which we receive requests.

Note: only genomes that are public in INSDC (GenBank, ENA, or DDBJ) are considered.

Please feel free to contact us with any questions, concerns, or suggestions!

Generated April 19, 2024