Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

About the NCBI Pathogen Detection system

NCBI Pathogen Detection project is a centralized system that integrates sequence data for bacterial pathogens

NCBI Pathogen Detection integrates bacterial and fungal pathogen genomic sequences from numerous ongoing surveillance and research efforts whose sources include food, environmental sources such as water or production facilities, and patient samples. Foodborne, hospital-acquired, and other clinically infectious pathogens are included.

The system provides two major automated real-time analyses:

  1. It quickly clusters related pathogen genome sequences to identify potential transmission chains, helping public health scientists investigate disease outbreaks
  2. As part of the National Database of Antibiotic Resistant Organisms (NDARO), NCBI screens genomic sequences using AMRFinderPlus to identify the antimicrobial resistance, stress response, and virulence genes found in bacterial genomic sequences, which enables scientists to track the spread of resistance genes and to understand the relationships among antimicrobial resistance, stress response, and virulence.

A number of public health agencies and researchers in the US and internationally are collecting samples from clinical cases, from the environment, from food products, and from industrial production facilities to facilitate active, real-time surveillance of pathogens, including foodborne disease. Public health agencies and researchers sequence the samples and submit the data to NCBI, which analyzes the sequences and compares them to others in its database, including all genomes in GenBank, to identify closely related sequences. The aim is to identify closely or clonally related isolates to aid in outbreak investigation. For example the FDA, CDC, and USDA use isolates from food and the environment linked to isolates associated with human illness to aid traceback investigations and outbreak response.

The NCBI Pathogen Detection Isolates Browser is a web-based portal that integrates the genomic sequence, metadata, antibiotic susceptibility and resistance gene information, and the SNP cluster information.

The NCBI Pathogen Detection Project also analyzes the assemblies in its database in real-time for known anti-microbial resistance (AMR) genes and other genes of interest and maintains software and databases to facilitate monitoring and research including the National Database of Antibiotic Resistant Organisms.

Our public health partners initially focused on sequencing and analyzing the four bacterial groups that are the major causes of foodborne illness in the US Salmonella, Escherichia coli and Shigella, and Listeria; but NCBI Pathogen Detection has since expanded our analyses to over 50 taxa such as Klebsiella penumoniae, Staphyloccus aureus, and Streptococcus pneumoniae.

Additional pathogens are added as capacity permits. The Organism Groups page lists the current set of organisms that are being tracked.

NCBI has developed a pipeline that assembles short-read Illumina sequence data and analyzes it in the context of all other sequences in the system for both genetic relatedness and for AMR surveillance. The pipeline compares assemblies to all other assemblies from the same taxonomic group to identify clusters of closely related isolates using wgMLST or K-mer comparisons. Within each of these clusters SNPs are called and phylogenetic trees inferred. Individual phylogenetic trees for each SNP clusters are available on FTP as well as the NCBI Pathogen Detection Isolates Browser. Results of assembly analysis for gene content are available in the NCBI Pathogen Detection Isolates Browser as well as the Microbial Browser for the Identification of Genetic and Genomic Elements. Assemblies and annotations generated by our system are deposited in GenBank and made publicly available. Details of the analysis system will be published at a future date.

To facilitate public health monitoring and anti-microbial resistance research NCBI also maintains databases and software to identify AMR and other genes relevant to public health. Sequences associated with these databases are available in the Pathogen Detection Reference Gene Catalog. HMMs, and the hierarchical relationships used for naming and identifying function are in the Reference HMM Catalog and Reference Gene Hierarchy respectively.

NCBI has also taken over for the Lahey Clinic as the allele assigning authority for many beta-lactamase genes as well as some additional families. See our allele request submission page for more information on allele submissions and the Reference Gene Catalog for existing allele assignments. The AMRFinderPlus software is open source and publicly available, see our National Database of Antibiotic Resistant Organisms (NDARO) page for links to all of our publicly available AMR resources.

Please see the Pathogens Help page for more detailed information on how to use these resources.

Several U.S. health agencies and international partners are currently contributing pathogen sequence data for real-time analysis.

Centers for Disease Control and Prevention (CDC)

Food and Drug Administration (FDA)

U.S. Department of Agriculture (USDA)

Public Health England (PHE)

Other Associations

Many state public health laboratories and others are contributing data. There is a page that lists the major contributors (note, not every submitter of every genome is listed on that page).

Association of Public Health Laboratories (APHL)

Global Microbial Identifier (GMI) is a grassroots attempt to build a global system of DNA genome databases for microbial and infectious disease identification and diagnostics. Projects that are flagged with the 'GMI' keyword can be found via this search /bioproject/?term=GMI[keyword]

If you have additional questions or wish to contact the NCBI Pathogen Detection team then please send an email to: