RefSeq Collaborators and data sources
The RefSeq project is ambitious in scope and we actively welcome opportunities to work with other groups to provide this collection. We value our collaborators contributed information ranging from completely annotated genomes, advice to improve the sequence or annotation of individual RefSeq records, information about official nomenclature, and information about function.
In addition to the significant information collected by collaboration, numerous NCBI staff are involved in database support, programmatic support, and curation.
We collaborate with many groups including:
- Consensus CDS (CCDS) Project
- consistent annotation of the human and mouse genomes is supported by a collaboration between NCBI, the Wellcome Trust Sanger Institute (WTSI) and the University of California, Santa Cruz (UCSC).
- Cytochrome P450
- Dr. Nelson curates gene content and representative sequences for this gene family.
- FlyBase
- FlyBase provides the Drosophila melanogaster RefSeq collection.
- Human Gene Mutation Database
- contributed to the initial set of human RefSeq records.
- HUGO Gene Nomenclature Committee
- provide official nomenclature for human genes and curate gene content and representative sequences.
- IMGT
- International Immunogenetics Information System
- Microbial Genomes
- Microbial genomes are submitted to GenBank by several groups; we would like to acknowledge that their efforts add significant value to the RefSeq collection as we mine for experimentally supported data. NCBI collaborates with some groups to improve our Prokaryotic genome annotation pipeline, or to provide additional information for the genome, genes, or protein products.
- mirRBase - the microRNA database
- this is the primary data source for vertebrate RefSeq and Gene records of this type of small RNA molecule.
- Mouse Genome Informatics
- MGI provide official nomenclature for mouse genes and curate gene content and representative sequences.
- Pseudogene.org
- one source of pseudogene content represented in RefSeq and Gene.
- Rat Genome Database
- RGD provides official nomenclature for rat genes and identities genes and representative sequences.
- SGD
- Saccharomyces Genome Database provides the annotated RefSeq records.
- SwissProt/UniProt
- NCBI and UniProt collaborate to provide cross-linking between protein datasets.
- The Arabidopsis Information Network
- TAIR provides the Arabidopsis thaliana RefSeq collection.
- VectorBase
- the source of genome annotation data represented in RefSeq and Gene for some of the invertebrate organisms that are vectors of human disease.
- Viral Genome Advisors
- the viral RefSeq collection is curated via an international collaboration and panel of viral advisors
- WormBase
- WormBase provides the Caenorhabditis elegans (nematode) RefSeq collection.
- Zebrafish Model Organism Database (ZFIN)
- provide official nomenclature for zebrafish genes and curate gene content and representative sequences.
In addition, numerous individuals have made valuable contributions by helping to curate data for specific genes, gene families, or organisms. While it is impossible to list them all here, their assistance is very much appreciated.