RefSeq Collaborators and data sources

The RefSeq project is ambitious in scope and we actively welcome opportunities to work with other groups to provide this collection. We value our collaborators contributed information ranging from completely annotated genomes, advice to improve the sequence or annotation of individual RefSeq records, information about official nomenclature, and information about function.

In addition to the significant information collected by collaboration, numerous NCBI staff are involved in database support, programmatic support, and curation.

We collaborate with many groups including:

Alliance of Genome Resources (AGR): A consortium of 7 model organism databases (MODs) and the Gene Ontology (GO) Consortium provides genomic information across species to facilitate comparative analysis
Chicken Gene Nomenclature Consortium (CGNC): provides official nomenclature for chicken genes
Consensus CDS (CCDS) Project: consistent annotation of the human and mouse genomes is supported by a collaboration between NCBI, the Wellcome Trust Sanger Institute (WTSI) and the University of California, Santa Cruz (UCSC).
Cytochrome P450: Dr. Nelson curates gene content and representative sequences for this gene family.
Echinobase: resource for functional genomics data of echinoderm species.
EMBL's European Bioinformatics Institute (EMBL-EBI): collaborates on the Matched Annotation from the NCBI and EMBL-EBI (MANE) project
FlyBase: FlyBase provides the Drosophila melanogaster RefSeq collection.
GENCODE: collaborates on the Matched Annotation from the NCBI and EMBL-EBI (MANE) project
Gene Ontology Consortium: provides consistent descriptions of gene products across databases
HUGO Gene Nomenclature Committee: provides official nomenclature for human genes and curate gene content and representative sequences.
Human Gene Mutation Database: contributed to the initial set of human RefSeq records
Human Protein Reference Database (HPRD): curated proteomic information pertaining to human proteins
IMGT: International Immunogenetics Information System
Microbial Genomes: Microbial genomes are submitted to GenBank by several groups; we would like to acknowledge that their efforts add significant value to the RefSeq collection as we mine for experimentally supported data. NCBI collaborates with some groups to improve our Prokaryotic genome annotation pipeline, or to provide additional information for the genome, genes, or protein products.
mirRBase - the microRNA database: this is the primary data source for vertebrate RefSeq and Gene records of this type of small RNA molecule.
Mouse Genome Informatics: MGI provide official nomenclature for mouse genes and curate gene content and representative sequences.
OMIM: Catalog of Human Genes and Genetic Disorders
Pseudogene.org: one source of pseudogene content represented in RefSeq and Gene.
Rat Genome Database: RGD provides official nomenclature for rat genes and identities genes and representative sequences.
SGD: Saccharomyces Genome Database provides the annotated RefSeq records.
SwissProt/UniProt: NCBI and UniProt collaborate to provide cross-linking between protein datasets.
The Arabidopsis Information Network: TAIR provides the Arabidopsis thaliana RefSeq collection.
VectorBase: the source of genome annotation data represented in RefSeq and Gene for some of the invertebrate organisms that are vectors of human disease.
Vertebrate Gene Nomenclature Committee (VGNC): provides official nomenclature for genes in vertebrate species that currently lack a nomenclature committee
VEuPathDB: Eukaryotic Pathogen, Vector and Host Informatics Resources
Viral Genome Advisors: the viral RefSeq collection is curated via an international collaboration and panel of viral advisors
WormBase: WormBase provides the Caenorhabditis elegans (nematode) RefSeq collection.
XenBase: Xenbase provides official nomenclature for Xenopus species.
Zebrafish Model Organism Database (ZFIN): provide official nomenclature for zebrafish genes and curate gene content and representative sequences.

In addition, numerous individuals have made valuable contributions by helping to curate data for specific genes, gene families, or organisms. While it is impossible to list them all here, their assistance is very much appreciated.

RefSeq

Integrated reference sequences

RefSeq Collaborators and data sources