Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Isolates Browser data at Google Cloud Platform Isolates Browser Documentation TOC Main documentation page

ALPHA RELEASE -- This is under active development and while we strive to maintain correctness, it is possible results may be unstable, unavailable, or incorrect at times. Please contact us by email at pd-help@ncbi.nlm.nih.gov before relying on this data for production analyses.

What data is available on the Google Cloud? Isolates Browser at Google Cloud Platform TOC Main documentation page

For a list of all resources see Pathogen Detection Resources at Google Cloud Platform

Data from the Pathogen Detection Isolates Browser is now available at Google Cloud Platform (GCP) in the ncbi-pathogen-detect.pdbrowser.isolates and ncbi-pathogen-detect.pdbrowser.isolate_exceptions tables at Google BigQuery. This data includes all fields available in the web browser and can be searched using Google Standard SQL instead of the SOLR Query Language. This also permits programmatic access and more complex queries. BigQuery will also allow you to download tables exceeding the 100,000 row limit for the Isolates Browser web download. NCBI is piloting this in BigQuery to help users leverage the benefits of elastic scaling and parallel execution of queries. BigQuery has a large collection of client libraries that can be used within your workflow. You can also interact with it on a web browser as described below.

Pathogen Detection Resources available on the Google Cloud

Update Frequency Isolates Browser at Google Cloud Platform TOC Main documentation page

The isolates and isolate_exceptions tables at Google Cloud BigQuery are updated daily. For this reason the contents may not agree exactly with those shown in the Isolates Browser. If you see unexpected discrepancies please let us know by emailing us at pd-help@ncbi.nlm.nih.gov.

Getting started with BigQuery MicroBIGG-E at Google Cloud Platform TOC Main documentation page

Our Getting started with BigQuery page has instructions on how to run queries with BigQuery.

ncbi-pathogen-detect.pdbrowser.isolates MicroBIGG-E at Google Cloud Platform TOC Main documentation page

This data corresponds to the main table of the Pathogen Detection Isolates Browser. Detailed information on the contents of the fields in this table is included in the documentation for the Isolates Browser web interface. Some fields (AMR_genotypes, virulence_genotypes, stress_genotypes, and AST_phenotypes are complex and contain AST_phenotypes and contain arrays of structs. The computed_types field contains a single struct. See the examples below for ideas of how to search those fields using Google Standard SQL.

ncbi-pathogen-detect.pdbrowser.isolate_exceptions MicroBIGG-E at Google Cloud Platform TOC Main documentation page

The isolate_exceptions table contains information on isolates that failed to process in our system becuase of quality control (QC) failures. See the Exceptions table documentation for the Isolates Browser web interface for more details.

Complex fields MicroBIGG-E at Google Cloud Platform TOC Main documentation page

The AMR_genotypes, stress_genotypes, and virulence_genotypes fields are arrays of structs, which require syntax that you might not be familiar with coming from other SQL-based databases. The AST_phenoypes field is a struct which has fields that are accessed by appending the subfield value to the field value like AST_phenotypes.serotype See the examples below for how to search based on those fields.

Linking to MicroBIGG-E data MicroBIGG-E at Google Cloud Platform TOC Main documentation page

NCBI Pathogen Detection also has MicroBIGG-E data in Google Cloud BigQuery and links can be made between the ncbi-pathogen-detect.pdbrowser.microbigge table and the ncbi-pathogen-detect.pdbrowser.isolates tables. Several fields are in common, but generally we recommend joining the two tables using the target_acc field. See below for examples.

Example searches MicroBIGG-E at Google Cloud Platform TOC Main documentation page

Find the AMR genes from an isolate of interest MicroBIGG-E at Google Cloud Platform TOC Main documentation page

SELECT biosample_acc, asm_acc, target_acc, mindiff, AMR_genotypes,
    stress_genotypes, virulence_genotypes, computed_types
FROM `ncbi-pathogen-detect.pdbrowser.isolates`
WHERE biosample_acc = 'SAMN08848639'

Find an isolate in isolate_exceptions MicroBIGG-E at Google Cloud Platform TOC Main documentation page

SELECT *
FROM `ncbi-pathogen-detect.pdbrowser.isolate_exceptions`
WHERE biosample_acc = 'SAMN16279178'

Find all Salmonella Newport isolates MicroBIGG-E at Google Cloud Platform TOC Main documentation page

SELECT target_acc, biosample_acc, erd_group, computed_types
FROM `ncbi-pathogen-detect.pdbrowser.isolates`
WHERE computed_types.serotype = 'Newport'
AND taxgroup_name LIKE 'Salmonella%'

Find the most common SNP clusters for Salmonella Newport isolates MicroBIGG-E at Google Cloud Platform TOC Main documentation page

SELECT erd_group, count(*) num_newport_isolates
FROM `ncbi-pathogen-detect.pdbrowser.isolates`
WHERE computed_types.serotype = 'Newport'
    AND taxgroup_name LIKE 'Salmonella%'
GROUP BY erd_group
ORDER BY num_newport_isolates DESC
LIMIT 5

Find isolates that are carbapenem resistant but don't have a KPC or NDM beta-lactamase MicroBIGG-E at Google Cloud Platform TOC Main documentation page

SELECT target_acc 
FROM `ncbi-pathogen-detect.pdbrowser.isolates` isolates
WHERE
(SELECT COUNT(1)
  FROM UNNEST(isolates.AST_phenotypes) 
  WHERE antibiotic LIKE '%penem' AND phenotype = 'resistant'
) >= 1
AND
(SELECT COUNT(1)
  FROM UNNEST(isolates.AMR_genotypes)
  WHERE element LIKE 'blaKPC%' OR element LIKE 'blaNDM%'
) = 0

Find all clusters where >75% have a blaKPC MicroBIGG-E at Google Cloud Platform TOC Main documentation page

SELECT kpc_count.erd_group, round(kpc_count.num / erd_size.num * 100) as pct_with_kpc, erd_size.num as snp_cluster_size
FROM
  (SELECT i1.erd_group, count(*) num
      FROM `ncbi-pathogen-detect.pdbrowser.isolates` i1
      WHERE (
        (SELECT COUNT(1)
        FROM UNNEST(AMR_genotypes) AS AMR_genotypes
        WHERE element LIKE 'blaKPC%'
        ) >= 1
      )
      GROUP BY erd_group
  ) kpc_count
  LEFT JOIN
    (SELECT erd_group, count(*) num
      FROM `ncbi-pathogen-detect.pdbrowser.isolates`
      GROUP BY erd_group
    ) erd_size
  ON kpc_count.erd_group = erd_size.erd_group
WHERE
  kpc_count.num / erd_size.num > 0.75
ORDER BY
  pct_with_kpc DESC, snp_cluster_size DESC

Join with the MicroBIGG-E table to find isolates that are carbapenem resistant but don't have a known carbapenem resistance gene or allele MicroBIGG-E at Google Cloud Platform TOC Main documentation page

SELECT isolates.target_acc,
  ARRAY(select AS STRUCT antibiotic, phenotype from UNNEST(AST_phenotypes) WHERE  antibiotic LIKE "%penem") AST
FROM `ncbi-pathogen-detect.pdbrowser.isolates` isolates
LEFT JOIN `ncbi-pathogen-detect.pdbrowser.microbigge` microbigge
  ON isolates.target_acc = microbigge.target_acc
  AND microbigge.subclass = 'CARBAPENEM' -- Only carbapenem genes / point mutations
WHERE
  (SELECT count(1) FROM unnest(AST_phenotypes) AS ast
    WHERE antibiotic like "%penem" AND phenotype = 'resistant') >= 1
  AND isolates.amrfinderplus_version IS NOT NULL -- AMRFinderPlus was run on this target
  AND isolates.asm_acc IS NOT NULL -- AMRFinderPlus results should be in MicroBIGG-E because assembly is public
  AND microbigge.subclass IS NULL -- There are no rows in MicroBIGG-E with subclass = CARBAPENEM
ORDER BY isolates.target_acc