Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Reference HMM Catalog Documentation Main documentation page

Beta Release

What is the Reference HMM Catalog? Reference HMM Catalog Documentation TOC Main documentation page

The Pathogen Detection Reference HMM Catalog is a web-based portal to our highly curated database of reference hidden Markov models (HMMs) used by AMRFinderPlus in concert with gene sequences in the Pathogen Detection Reference Gene Catalog to identify antimicrobial resistance (AMR) genes as well as some stress resistance and virulence genes. This is a highly curated subset of the HMMs included in the NCBI Protein Family Models database.

Every row in the Pathogen Detection Reference HMM Catalog is an individual HMM. Details including the seed alignment and HMM profile are available by clicking on the HMM accession in the table. The information in the Reference HMM Catalog, including seed alignments and profiles are also available on our Reference HMM FTP site.

Scope: The Reference HMM Catalog includes two data subsets:

  1. "Core": this is a more narrowly curated AMR-specific subset of genes and proteins that are considered more likely to be informative about AMR phenotype.
  2. "Plus": this subset includes genes related to biocide and stress resistance, general efflux, virulence, or antigenicity as well as AMR genes whose presence or absence are not as likely to be informative about phenotype.

Relationships among the Reference HMM Catalog and other Pathogen Detection Browsers Reference HMM Catalog Documentation TOC Main documentation page

  • NCBI Pathogen Detection provides four table-based browsers to provide easy web-based access to the results of our analysis and the databases we curate. All are related resources and integrated with each other.

  • The main similarities between the resources are their shared search engine and similar search techniques.

  • The main difference between the resources is the scope of data being searched, the set of data fields (and filters which are based on data fields) that are available for searching, and the columns that are shown in the display of search results.
    • Every row in the Reference HMM Catalog is an HMM with a curated cutoff to identify a gene family.

Relationship between the Reference HMM Catalog and the Reference Gene Catalog Reference HMM Catalog Documentation TOC Main documentation page

The Pathogen Detection Reference HMM Catalog, Pathogen Detection Reference Gene Hierarchy, and Pathogen Detection Reference Gene Catalog are used by AMRFinderPlus in concert to identify genes. In general the HMMs are used to avoid incorrect annotations and to identify more distant functional relatives to genes in the Reference Gene Catalog. This includes the discovery of novel AMR genes such as fosA7. The Reference Gene Hierarchy provides the higher level organization integrating protein sequences included in the Pathogen Detection Reference Gene Catalog with the HMMs included in the Pathogen Detection Reference HMM Catalog.

Relationship between the Reference HMM Catalog and MicroBIGG-E Reference HMM Catalog Documentation TOC Main documentation page

MicroBIGG-E (described here) contains AMRFinderPlus results including the most specific HMM that matches with a score above the curated cutoff. Not all results will have HMM hits, and not all HMM hits above cutoff will be to the most specific HMM. Clicking the MicroBIGG-E link in the Reference HMM Browser will show all genetic elements in MicroBIGG-E for which that was the most specific HMM that had a match scoring above TC1. Note that HMMs are only searched when AMRFinderPlus analysis type is COMBINED. No HMMs are searched against nucleotide sequence.

Where to access the Pathogen Detection Reference HMM Catalog Reference HMM Catalog Documentation TOC Main documentation page

The Pathogen Detection Reference HMM Catalog and Pathogen Detection Reference Gene Catalog is accessible from a link on the right margin of the Pathogen Detection Project home page, from the AMR landing page, and the AMR Resources page.

You can also access the Pathogen Detection Reference HMM Catalog directly from the links below:

Search tips for the Pathogen Detection Reference HMM Catalog Reference HMM Catalog Documentation TOC Main documentation page

Data fields in the Pathogen Detection Reference HMM Catalog Reference HMM Catalog Documentation TOC Main documentation page

The data fields listed below have been indexed by the Pathogen Detection project and are therefore directly searchable, using the advanced search techniques that are described in the Isolates Browser help, because both use the SOLR query language. Note that the data field names and values are case sensitive, as described in the Isolates Browser help.

Each data field reflects an available column in the Pathogen Detection Reference HMM Catalog web interface. The output section of this document describes the use of filters as an alternate way of searching through the data.

Please note: in the list of available data fields below:

  • The term shown in the regular font is the display name (column header) shown by the Pathogen Detection Reference HMM Catalog web interface. The term shown in (italics) is the name of the corresponding data field if you want to search that field directly.
  • For example, one data field is listed as: Gene symbol (gene_symbol) (with an underscore bar instead of a space). This is the case sensitive string you should use if you want to search the data field directly using the query box.
  • Brief Italicized search examples are also provided for some of the data fields showing how to query the field directly. The values represent text strings exactly as they appear in data fields, including upper case and lower case letters, including special characters such as hyphens, etc. The data field names are case sensitive.
Accession (hmm_accession)
MicroBIGG-E link
HMM description (hmm_description)
Length (hmm_length)
TC1 (TC1)
TC2 (TC2)
Scope (scope)
Type (type)
Subtype (subtype)
Class (class)
Subclass (subclass)

Accession (hmm_accession) Reference HMM Catalog Documentation TOC Main documentation page

The accession of this HMM. Clicking the HMM accession will take you to the HMM page in the Protein Family Models database. From that page you can download the HMM itself and get additional information including the curated cutoffs, the seed alignment, and RefSeq sequences identified by this HMM.

Data field names and values are case sensitive, as shown in the examples below. Use quotes to search for phrases, as shown in the example below. Additional sections of this document provide tips about search terms that contain special characters (such as the parentheses, hyphens, and apostrophes), and the use of wildcards (such as the asterisk or question mark).

Examples:
  • To search this field directly, enter a query such as:    hmm_accession:searchterm
  • Search for:    hmm_accession:NF000053.2
    to show information about the Hidden Markov Model with accession NF000053.2 (trimethoprim-resistant dihydrofolate reductase DfrA12).

An icon will appear in this field if there are entries in MicroBIGG-E that have this HMM as the most specific HMM hit above the curated trusted cutoff (TC1).

HMM description (hmm_description) Reference HMM Catalog Documentation TOC Main documentation page

The name of the Hidden Markov Model (HMM) that hits this element (if any).

Data field names and values are case sensitive, as shown in the examples below. Use quotes to search for phrases, as shown in the example below. Additional sections of this document provide tips about search terms that contain special characters (such as the parentheses, hyphens, and apostrophes), and the use of wildcards (such as the asterisk or question mark).

Length (hmm_length) Reference HMM Catalog Documentation TOC Main documentation page

The number of amino-acid positions in the HMM seed alignment

TC1 (TC1) Reference HMM Catalog Documentation TOC Main documentation page

Trusted cutoff 1, the per-sequence reporting threshold. This is a curated value designed to identify members of this "family" and function.

TC2 (TC2) Reference HMM Catalog Documentation TOC Main documentation page

Trusted cutoff 2, the per-domain reporting threshold. This is a curated value designed to identify members of this "family" and function.

Scope (scope) Reference HMM Catalog Documentation TOC Main documentation page

This field specifies the data subset to which an allele or gene belongs, and the value can either be core (highly curated, AMR-specific genes and point mutations), plus (genes related to biocide and stress resistance, general efflux, virulence, or antigenicity), or non-reportable which are HMMs that specifically identify genes that AMRFinderPlus will not report within a broader HMM-identified family that is reported.

Data field names and values are case sensitive. In this case, both the data field name and the value are written in all lower case, as shown in the example below.

Examples:
  • To search this field directly, enter a query such as:    scope:searchterm
  • Search for:    scope:plus
    to show the genes in the "plus" subset of the Pathogen Detection Reference Gene Catalog. That subset includes genes related to biocide and stress resistance, general efflux, virulence, or antigenicity.

Type (type) Reference HMM Catalog Documentation TOC Main documentation page

Classification for the type of gene found, such as AMR, STRESS, or VIRULENCE. A more detailed description of the type and subtype fields is available on the AMRFinderPlus wiki.

(In general, type and subtype refer to the category of gene or genetic element, while class and subclass refer to the a phenotype associated with the genetic element.)

Data field names and values are case sensitive, and the values for this data field are written in upper case, as shown in the example below.

Examples:
  • To search this field directly, enter a query such as: type:searchterm
    Search for:    type:STRESS
    to show genes that confer stress resistance.

  • As an alternative method for retrieving those genes, you can open the "Filters" function of the Reference HMM Catalog and check the box for the desired Type. By doing so, the Filters function will refresh itself to show the subtype values that are available for the type you have selected, enabling you to further narrow your search results, if desired. For example, the subtype values under STRESS currently include BIOCIDE, METAL, and ACID. (As noted below, filters are Filters are generated on the fly and reflect the attributes of the data that you are currently viewing.)

Subtype (subtype) Reference HMM Catalog Documentation TOC Main documentation page

More specific type for element if available. Otherwise contents will be identical to Type. A more detailed description of the type and subtype fields is available on the AMRFinderPlus wiki.

(In general, type and subtype refer to the category of gene or genetic element, while class and subclass refer to the a phenotype associated with the genetic element.)

Examples:
  • To search this field directly, enter a query such as:    subtype:searchterm
  • Search for:    subtype:METAL
    to show genes that contribute to metal resistance.
  • As an alternative method for retrieving those genes, you can open the "Filters" function of the Reference HMM Catalog and check the box for the desired Type. By doing so, the Filters function will refresh itself to show the subtype values that are available for the type you have selected, enabling you to further narrow your search results, if desired. For example, the subtype values under STRESS currently include BIOCIDE, METAL, and ACID. (As noted below, filters are Filters are generated on the fly and reflect the attributes of the data that you are currently viewing.)

Class (class) Reference HMM Catalog Documentation TOC Main documentation page

Resistance target for genes of type AMR or STRESS, or typing information for some virulence genes.

This data field also appears in the Pathogen Detection Reference Gene Catalog; a description of Class and examples of queries for that field appear in the Reference Gene Catalog data fields help section. A more detailed description of the class and subclass fields is available on the AMRFinderPlus wiki.

(In general, type and subtype refer to the category of gene or genetic element, while class and subclass refer to the a phenotype associated with the genetic element.)

Subclass (subclass) Reference HMM Catalog Documentation TOC Main documentation page

Where it is known, "subclass" provides a more specific definition of the particular antibiotics or classes of stressors that are affected by the genes identified by this HMM (e.g., that are resisted by the gene). While most subclass designations are self-explanitory, a few others have particular meanings. Specifically, "CEPHALOSPORIN" is equivalent to the Lahey 2be definition; "CARBAPENEM" means the protein has carbapenemase activity, but it might or might not confer resistance to other beta-lactams; "QUATENARY AMMONIUM" are quaternary ammonium compounds. In addition, stx subtypes (e.g., STX2E) and intimin subtypes (e.g., ALPHA) are defined for Shiga toxin proteins (class of STX1 or STX2) and intimins (class of INTIMIN) respectively. Where the phenotypic information is incomplete, contradictory, or unclear, the "Class" value is used for the "Subclass" value.

This data field also appears in the Pathogen Detection Reference Gene Catalog; a description of Class and examples of queries for that field appear in the Reference Gene Catalog data fields help section.

(In general, type and subtype refer to the category of gene or genetic element, while class and subclass refer to the a phenotype associated with the genetic element.)

Examples:
  • To search this field directly, enter a query such as:    subclass:searchterm
  • Search for:    subclass:CARBAPENEM
    to show HMMs that identify genes that contribute to carbapenem resistance.
  • As an alternative method for retrieving those genes, you can open the "Filters" function of the Reference HMM Catalog and check the box for the desired class. You can search through available classes by using the Search field at the top of the filter box. Note that searches are case-sensitive, so to identify QUINOLONE resistance HMMs you could type QUINOLONE in the Search field and the Filters function will refresh itself to show the subclass values that contain that substring (currently QUINOLONE and PHENICOL/QUINOLONE). (As noted below, filters are Filters are generated on the fly and reflect the attributes of the data that you are currently viewing.)

Output

Tabular list of HMMs Reference HMM Catalog Documentation TOC Main documentation page

  • Upon opening the Pathogen Detection Reference HMM Catalog, a table displays data for all HMMs that are currently in the catalog.
  • Every row in the Pathogen Detection Reference HMM Catalog display is a reference HMM.
  • The rows can be sorted by clicking on column headers, filtered by clicking on the filters bar, or searched using basic and advanced search techniques.

Filters to refine results Reference HMM Catalog Documentation TOC Main documentation page

Filters are activated by clicking on the bar labelled "Filters" just under the search box. This allows you to facet or subset the data in a variety of ways, and therefore can be used to refine your results, whether you have done a basic or advanced search.

The filter menu allows all data fields in the column chooser to be filtered. By default, each filter displays the top 100 terms (based on the number of rows retrieved by a term).

  • A Boolean "OR" is applied if multiple items are checked in the same filter field. This way you can choose multiple values in the same filter.
  • A Boolean "AND" is applied if you select items in multiple different filter fields. (e.g., Scope, Class, etc).
  • If you prefer to apply a boolean "AND" to multiple terms in the same filter field, you can enter a Solr query.
  • Filters are generated "on the fly" for a given dataset
    • The choices listed in the "Filters" panels reflect the attributes of the isolates that you are currently viewing in the browser.
    • By default only the top 100 terms (based on the number of rows retrieved by a term are shown).
    • Numbers of rows for each filter term are displayed to the right of that term.
    • The total number of values in the filter is displayed at the bottom of the filter tab.
  • The list of values within each filter tab can be searched using the controls the top of the tab. This can reveal values not in the top 100.
    • Text fields can be searched by typing exact substrings of the values in the field.
    • Numeric fields have ranges that can be selected using the check buttons and ranges listed.
    • Date fields can be searched using date ranges with some commonly used presets listed as buttons.
  • The search box can be reset with the reset button beside the search box. The entire filter can be removed with the 'X' at the top right corner.
  • Filters can be collapsed if more than one is shown with the double left hand arrow at the bottom left, and opened again after collapse with the double right hand arrow on collapsed tabs. Each tab is labeled with the filter name in the left margin.
  • Clicking the filter bar again will collapse the filter and show a SOLR query string that can be used in the search box.
  • Note that the filters match exact strings, so capitalization and punctuation will be matched against. Use multiple synonyms where needed in fields that don't have a controlled vocabulary.

Data Retention Policy Reference Gene Hierarchy TOC Main documentation page

The Reference Gene Hierarchy web interface only shows the most recent release of the AMRFinderPlus database, but all previous releases are retained on our FTP site. See Reference data retention for details.