Health
Pathogen Detection
How to Submit Data for Real-Time Analysis

How to Submit Data for Real-Time Analysis

For New Submitters!

If you have genomic pathogen data from cultured isolates, whether you are a new submitter or not, starting a new project or continuing an old project, or whether you have submitted data to NCBI before or not, BEFORE DOING ANYTHING ELSE please contact us first at pd-help@ncbi.nlm.nih.gov. Please be prepared to answer questions about:

What type of pathogen(s) you are sequencing?
Do you have Illumina sequencing data? If you sequenced your isolates using another technology then you can submit the assembled genomes directly to GenBank (see below).
Are you willing to submit and make the data publicly available?
How many isolates are you planning on sequencing and submitting in the next year?
What is the timeline for the data submissions to start?

Why submit?

To contribute information on pathogen sequences that may help discover sources of contamination or help in solving outbreaks more quickly
To provide valuable real-time information about the relationship of an isolate to other isolates and outbreaks
To enhance the set of pathogen genomes that can be used by the scientific community
To supply information on the set of resistance genes present in a pathogen

The NCBI Pathogen Detection system is built on the foundation of open data. Data are intended to be submitted and released to the public immediately. Currently four major foodborne pathogens (Campylobacter, Escherichia coli and Shigella, Listeria, and Salmonella) are being analyzed in real time as participating public health agencies submit the sequences. Pathogens are also being analyzed for antimicrobial resistance.

For accounts of actual projects and submissions, please see Success Stories

Who may submit?

Public health organizations (typically public health labs gathering and characterizing isolates)
Hospitals and medical service networks interested in antibiotic resistance
Researchers studying bacterial evolution, population dynamics, or disease outbreaks (retrospective or prospective)
Repositories wishing to further characterize their isolate holdings through sequencing

When to submit

The best time to submit is as soon as the sequencing run has finished on the sequencer. In addition, existing records can be updated and additional data such as antibiograms and NGS read sets can be added to BioSample submissions.

How to submit

The NCBI Submission Portal supports three submission modes:

Interactive web forms (good for submissions containing only a small number of isolates)
Combination web forms and Excel/tab-delimited files (appropriate for large numbers of isolates)
Fully autonomous xml-based document exchange (appropriate for automated systems)

The data files in a submission can be large. These can be uploaded to NCBI by ftp , http , or aspera client (at no cost to the submitter). NGS reads are typically uploaded in fastq format, although other formats are supported as well. Files may be archived with tar archives and compressed with gzip.

What to submit

The Pathogen system accepts genomic data from the ILLUMINA platform sequencing of cultured microbial organisms. In particular four major foodborne pathogens are currently being analyzed in real time as participating public health agencies submit the sequences. Please check the Organism Groups page for a complete list of the currently supported organisms.

The Pathogen system at NCBI requires three basic elements:

A record in the BioProject database that describes the project or initiative.
For each pathogen sequenced, a record in the BioSample database with the isolate metadata (such as collection_date)
For each pathogen, the raw sequence data, submitted to the Sequence Read Archive (SRA) database

The three elements can be submitted simultaneously or sequentially.

Projects

See: BioProject homepage

A BioProject record is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. It provides data consumers a single place to find the diverse data generated for that project.

BioProjects can also be organized in a hierarchy to describe several related projects or initiatives. An example of an umbrella BioProject is the FDA GenomeTrakr Bioproject for Salmonella (PRJNA183844) . This project organizes multiple subprojects that are connected to the sample and sequence metadata, with one subproject for each group submitting data.

An example bioproject with links to sequence data is from NY State/Wadsworth center for Salmonella (PRJNA183850).

If you have any questions about BioProject submissions write to genomeprj@ncbi.nlm.nih.gov.

Samples

See: BioSample home page

The BioSample database contains descriptions of biological source materials used in experimental assays.

Different templates are used in Biosample submission to capture the minimal set of data fields (required or optional) that are useful descriptors of the biological source material. For the Pathogen Detection project there is a pathogen template (Pathogen affecting public health) that is used to describe the sample including differentiation of isolates from clinical vs. environmental/food/other sources as well as information on when and where the isolate was obtained. A newer package One Health Enteric Package has been introduced as a One Health-compatible metadata package for genomic surveillance of enteric microbial organisms. This package has more support for host/isolation source specification.

For pathogen submissions, please use one of these package templates:

An example BioSample record from NY State/Wadsworth that uses the package Pathogen: environmental/food/other; version 1.0 is: SAMN02777761

An example BioSample record from PA Dept of Health that uses the package Pathogen: clinical or host-associated; version 1.0 is: SAMN02645749

An example BioSample record from a European AMR surveillance project that uses package OneHealthEnteric 1.0 is: SAMN31087410

For more information about how to submit to BioSamples, see https://submit.ncbi.nlm.nih.gov/subs/biosample or write to biosamplehelp@ncbi.nlm.nih.gov.

Additional requirements

For the Pathogen Detection system, biosample data must also observe these requirements:

Be sure to include a strain or isolate attribute value in your biosample submission. The value should be informational, not missing or not applicable etc. Only one of these fields needs to be filled in. While the BioSample resource allows for submissions with neither strain or isolate values, uniquifying isolates is a critical step in distinguishing them from one another and is often needed for scientific publication. Moreover, Pathogen cannot deposit the isolate's assembly into GenBank without a unique identifier. Consequently, final AMR results and downloadable AMR sequences cannot be computed and served for these isolates.
Please identify isolates to at least a binomial species name. Non-binomial names like Enterobacter cloacae sp. xxxx or Enterobacter cloacae species complex sp. xxxx are not supported in Pathogen Detection. This is because Pathogen Detection performs analysis based on the assumption that isolates are already fully identified. Pathogen Detection is not a tool that can be used for bacterial identification. Isolate identification to the subspecies or strain level is also acceptable provided an entry for that subspecies or strain already exists in NCBI Taxonomy. An example of such an identification would be Salmonella enterica subsp. enterica serovar Newport .

Sequence Reads

There is a submission page for table-based submissions that allows batch BioSample/SRA data intended for users of desktop sequencers. This method is not intended for large volumes.

The SRA submission portal page is at: https://submit.ncbi.nlm.nih.gov/subs/sra.

The SRA home page is at sra.

For more information about how to submit Next Generation Sequencing reads to the SRA, or about transferring large files of NGS sequencing data, see https://submit.ncbi.nlm.nih.gov/subs/sra.

If you have any questions about SRA submissions write to sra@ncbi.nlm.nih.gov.

Antimicrobial susceptibility test (AST) data - antibiograms

NCBI is building a database linking genome sequence data to antimicrobial susceptibility tests – The National Database of Antibiotic Resistant Organisms (NDARO).

The critical difference between BioSample submissions for standard pathogens and antibiotic-resistant organisms is that the latter category will include phenotypic (AST) data.

An example of a BioSample record with AST data would be SAMN01163409, a clinical isolate from Enterobacter cloacae. The AST data can be viewed in the Antibiotic Susceptibility Test (AST) Browser; the example data for SAMN01163409 can be found here.

AST data can be supplied as a separate table at submission time along with the isolate metadata via the Submission Portal.

There are two templates, one for standard MIC-based antibiograms for most bacteria, and one specficially for Mycobacteria without MICs.

Specifications for the antibiogram submission format, along with controlled vocabularies for various fields can be found at:

antibiogram for most organisms
antibiogram-myco without MICs for Mycobacteria

Download the template for adding MIC-based AST data to new or existing BioSample submissions or adding AST data to new or existing Mycobacterial BioSample submissions.

Assembled Genome Submissions

Many scientists submit genome assemblies to GenBank. For pathogen genome sequences NCBI encourages the submission of the sequence reads. There is a great deal of value in having the raw data to assess genotype confidence, assembly quality for SNP evaluation, and to detect contamination events. For those that are submitting both SRA data and assembled genomes, please use the same BioSample record for both submissions. Genomes that are submtited to GenBank through normal channels and publicly released are automatically included in the pathogen analyses for the organisms in question, and no additional steps need to be taken.

NCBI encourages scientists who submit assembled pathogen genomes to use the BioSample Pathogen template to describe the isolates
NCBI encourages scientists who submit assembled pathogen genomes to also submit the raw sequence reads to SRA

For more information about how to submit draft assemblies to the GenBank WGS division, see WGS Submit. For more information about how to submit complete genomes with or without annotation, see /genbank/genomesubmit.

You can also write to GenBank directly at genomes@ncbi.nlm.nih.gov.

Beta-lactamase Gene/Protein Submissions

NCBI assigns alleles for certain beta-lactamase families. For information on beta-lactamase submissions: See How to Request New Alleles for Beta-Lactamase, MCR, and Qnr Genes

More about NCBI Submissions

The NCBI Submission Portal provides an interface to submit a variety of different data types. While the Submission Portal will report errors for invalid submission attempts, submitters are responsible for the content and accuracy of their records, and for ensuring that sufficient information has been provided to allow users to fully interpret their study.

General overview for NCBI submissions.

Primary sequencing data submissions.

You can also write to the NCBI helpdesk at:

info@ncbi.nlm.nih.gov.