Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

How to Submit Data for Real-Time Analysis

For New Submitters!

If you have genomic pathogen data from cultured isolates, whether you are a new submitter or not, starting a new project or continuing an old project, or whether you have submitted data to NCBI before or not, BEFORE DOING ANYTHING ELSE please contact us first at pd-help@ncbi.nlm.nih.gov. Please be prepared to answer questions about:

  • What type of pathogen(s) you are sequencing?
  • Do you have Illumina sequencing data? If you sequenced your isolates using another technology then you can submit the assembled genomes directly to GenBank (see below).
  • Are you willing to submit and make the data publicly available?
  • How many isolates are you planning on sequencing and submitting in the next year?
  • What is the timeline for the data submissions to start?
  • To contribute information on pathogen sequences that may help discover sources of contamination or help in solving outbreaks more quickly
  • To provide valuable real-time information about the relationship of an isolate to other isolates and outbreaks
  • To enhance the set of pathogen genomes that can be used by the scientific community
  • To supply information on the set of resistance genes present in a pathogen

The NCBI Pathogen Detection system is built on the foundation of open data. Data are intended to be submitted and released to the public immediately. Currently four major foodborne pathogens (Campylobacter, Escherichia coli and Shigella, Listeria, and Salmonella) are being analyzed in real time as participating public health agencies submit the sequences. Pathogens are also being analyzed for antimicrobial resistance.

For accounts of actual projects and submissions, please see Success Stories

  • Public health organizations (typically public health labs gathering and characterizing isolates)
  • Hospitals and medical service networks interested in antibiotic resistance
  • Researchers studying bacterial evolution, population dynamics, or disease outbreaks (retrospective or prospective)
  • Repositories wishing to further characterize their isolate holdings through sequencing

The best time to submit is as soon as the sequencing run has finished on the sequencer. In addition, existing records can be updated and additional data such as antibiograms and NGS read sets can be added to BioSample submissions.

The NCBI Submission Portal supports three submission modes:

  • Interactive web forms (good for submissions containing only a small number of isolates)
  • Combination web forms and Excel/tab-delimited files (appropriate for large numbers of isolates)
  • Fully autonomous xml-based document exchange (appropriate for automated systems)

The data files in a submission can be large. These can be uploaded to NCBI by ftp , http , or aspera client (at no cost to the submitter). NGS reads are typically uploaded in fastq format, although other formats are supported as well. Files may be archived with tar archives and compressed with gzip.

The Pathogen system accepts genomic data from the ILLUMINA platform sequencing of cultured microbial organisms. In particular four major foodborne pathogens are currently being analyzed in real time as participating public health agencies submit the sequences. Please check the Organism Groups page for a complete list of the currently supported organisms.

The Pathogen system at NCBI requires three basic elements:

  • A record in the BioProject database that describes the project or initiative.
  • For each pathogen sequenced, a record in the BioSample database with the isolate metadata (such as collection_date)
  • For each pathogen, the raw sequence data, submitted to the Sequence Read Archive (SRA) database

The three elements can be submitted simultaneously or sequentially.

Projects

See: BioProject homepage

A BioProject record is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. It provides data consumers a single place to find the diverse data generated for that project.

BioProjects can also be organized in a hierarchy to describe several related projects or initiatives. An example of an umbrella BioProject is the FDA GenomeTrakr Bioproject for Salmonella (PRJNA183844) . This project organizes multiple subprojects that are connected to the sample and sequence metadata, with one subproject for each group submitting data.

An example bioproject with links to sequence data is from NY State/Wadsworth center for Salmonella (PRJNA183850).

If you have any questions about BioProject submissions write to genomeprj@ncbi.nlm.nih.gov.

Samples

See: BioSample home page

The BioSample database contains descriptions of biological source materials used in experimental assays.

Different templates are used in Biosample submission to capture the minimal set of data fields (required or optional) that are useful descriptors of the biological source material. For the Pathogen Detection project there is a pathogen template (Pathogen affecting public health) that is used to describe the sample including differentiation of isolates from clinical vs. environmental/food/other sources as well as information on when and where the isolate was obtained. A newer package One Health Enteric Package has been introduced as a One Health-compatible metadata package for genomic surveillance of enteric microbial organisms. This package has more support for host/isolation source specification.

For pathogen submissions, please use one of these package templates:

An example BioSample record from NY State/Wadsworth that uses the package Pathogen: environmental/food/other; version 1.0 is: SAMN02777761

An example BioSample record from PA Dept of Health that uses the package Pathogen: clinical or host-associated; version 1.0 is: SAMN02645749

An example BioSample record from a European AMR surveillance project that uses package OneHealthEnteric 1.0 is: SAMN31087410

For more information about how to submit to BioSamples, see https://submit.ncbi.nlm.nih.gov/subs/biosample or write to biosamplehelp@ncbi.nlm.nih.gov.

Sequence Reads

There is a submission page for table-based submissions that allows batch BioSample/SRA data intended for users of desktop sequencers. This method is not intended for large volumes.

The SRA submission portal page is at: https://submit.ncbi.nlm.nih.gov/subs/sra.

The SRA home page is at https://www.ncbi.nlm.nih.gov/Traces/sra.

For more information about how to submit Next Generation Sequencing reads to the SRA, or about transferring large files of NGS sequencing data, see https://submit.ncbi.nlm.nih.gov/subs/sra.

If you have any questions about SRA submissions write to sra@ncbi.nlm.nih.gov.

NCBI is building a database linking genome sequence data to antimicrobial susceptibility tests – The National Database of Antibiotic Resistant Organisms (NDARO).

The critical difference between BioSample submissions for standard pathogens and antibiotic-resistant organisms is that the latter category will include phenotypic (AST) data.

An example of a BioSample record with AST data would be SAMN01163409, a clinical isolate from Enterobacter cloacae

AST data can be supplied as a separate table at submission time along with the isolate metadata via the Submission Portal.

There are two templates, one for standard MIC-based antibiograms for most bacteria, and one specficially for Mycobacteria without MICs.

Specifications for the antibiogram submission format, along with controlled vocabularies for various fields can be found at:

https://www.ncbi.nlm.nih.gov/biosample/docs/antibiogram/ for most organisms
https://www.ncbi.nlm.nih.gov/biosample/docs/antibiogram-myco/ without MICs for Mycobacteria

Download the template for adding MIC-based AST data to new or existing BioSample submissions or adding AST data to new or existing Mycobacterial BioSample submissions.

Many scientists submit genome assemblies to GenBank. For pathogen genome sequences NCBI encourages the submission of the sequence reads. There is a great deal of value in having the raw data to assess genotype confidence, assembly quality for SNP evaluation, and to detect contamination events. For those that are submitting both SRA data and assembled genomes, please use the same BioSample record for both submissions. Genomes that are submtited to GenBank through normal channels and publicly released are automatically included in the pathogen analyses for the organisms in question, and no additional steps need to be taken.

  • NCBI encourages scientists who submit assembled pathogen genomes to use the BioSample Pathogen template to describe the isolates
  • NCBI encourages scientists who submit assembled pathogen genomes to also submit the raw sequence reads to SRA

For more information about how to submit draft assemblies to the GenBank WGS division, see https://www.ncbi.nlm.nih.gov/genbank/wgs.submit/. For more information about how to submit complete genomes with or without annotation, see https://www.ncbi.nlm.nih.gov/genbank/genomesubmit.

You can also write to GenBank directly at genomes@ncbi.nlm.nih.gov.

NCBI assigns alleles for certain beta-lactamase families. For information on beta-lactamase submissions: https://www.ncbi.nlm.nih.gov/pathogens/submit-beta-lactamase/

The NCBI Submission Portal provides an interface to submit a variety of different data types. While the Submission Portal will report errors for invalid submission attempts, submitters are responsible for the content and accuracy of their records, and for ensuring that sufficient information has been provided to allow users to fully interpret their study.

General overview for NCBI submissions.

Primary sequencing data submissions.

You can also write to the NCBI helpdesk at:

info@ncbi.nlm.nih.gov.