Taxonomy report

General information about a taxonomic identifier

Taxonomy report

General information about a taxonomic identifier

The downloaded taxonomy data package contains a taxonomy data report in JSON Lines format in the file:

ncbi_dataset/data/taxonomy_report.jsonl

Each line of the taxonomy data report file is a hierarchical JSON object that represents a single taxonomy record. The schema of the taxonomy record is defined in the tables below where each row describes a single field in the report or a sub-structure, which is a collection of fields. The outermost structure of the report is TaxonomyNode.

Table fields that include a Table Field Mnemonic can be used with the dataformat command-line tool's --fields option. Refer to the dataformat CLI tool reference to see how you can use this tool to transform taxonomy data reports from JSON Lines to tabular formats.

Sample report

{
  "taxonomy": {
    "taxId": 9606,
    "rank": "SPECIES",
    "currentScientificName": {
      "name": "Homo sapiens",
      "authority": "Linnaeus, 1758"
    },
    "curatorCommonName": "human",
    "groupName": "primates",
    "classification": {
      "superkingdom": {
        "name": "Eukaryota",
        "id": 2759
      },
      "kingdom": {
        "name": "Metazoa",
        "id": 33208
      },
      "phylum": {
        "name": "Chordata",
        "id": 7711
      },
      "class": {
        "name": "Mammalia",
        "id": 40674
      },
      "order": {
        "name": "Primates",
        "id": 9443
      },
      "family": {
        "name": "Hominidae",
        "id": 9604
      },
      "genus": {
        "name": "Homo",
        "id": 9605
      },
      "species": {
        "name": "Homo sapiens",
        "id": 9606
      }
    },
    "parents": [
      1,
      131567,
      2759,
      33154,
      33208,
      6072,
      33213,
      33511,
      7711,
      89593,
      7742,
      7776,
      117570,
      117571,
      8287,
      1338369,
      32523,
      32524,
      40674,
      32525,
      9347,
      1437010,
      314146,
      9443,
      376913,
      314293,
      9526,
      314295,
      9604,
      207598,
      9605
    ],
    "children": [
      741158,
      63221
    ],
    "counts": [
      {
        "type": "COUNT_TYPE_ASSEMBLY",
        "count": 1133
      },
      {
        "type": "COUNT_TYPE_GENE",
        "count": 193454
      },
      {
        "type": "COUNT_TYPE_tRNA",
        "count": 701
      },
      {
        "type": "COUNT_TYPE_rRNA",
        "count": 785
      },
      {
        "type": "COUNT_TYPE_snRNA",
        "count": 167
      },
      {
        "type": "COUNT_TYPE_scRNA",
        "count": 4
      },
      {
        "type": "COUNT_TYPE_snoRNA",
        "count": 1201
      },
      {
        "type": "COUNT_TYPE_PROTEIN_CODING",
        "count": 20634
      },
      {
        "type": "COUNT_TYPE_ncRNA",
        "count": 22103
      },
      {
        "type": "COUNT_TYPE_BIOLOGICAL_REGION",
        "count": 128261
      },
      {
        "type": "COUNT_TYPE_OTHER",
        "count": 844
      }
    ],
    "genomicMoltype": "dsDNA"
  },
  "query": [
    "9606"
  ]
}

TaxonomyNode Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
taxIdcoming sooncoming soonuint32NCBI Taxonomy identifier9606
rankRankTypeThe taxonomic rank of the taxonomic node.kingdom
currentScientificNameNameAndAuthorityThe currently accepted name chosen out of all synonyms for the taxonomic node.Wickerhamiella versatilis (Etchells & T.A. Bell) de Vega & Lachance, 2017
basionymNameAndAuthorityThe originally described name, no longer in use. Attached to the type material and species description.Brettanomyces versatilis Etchells & T.A. Bell, 1950
curatorCommonNamecoming sooncoming soonstringThe canonical common name.sweet orange
groupNamecoming sooncoming soonstringA common name describing large, well-known taxa.even-toed ungulates
hasTypeMaterialcoming sooncoming soonboolA boolean that indicates whether or not type material is available for the species.
classificationClassificationA subset of parent nodes including well-established ranks.
parents repeatedcoming sooncoming soonuint32Taxids of all parents, ordered from most specific (immediate parent), to most general.
children repeatedcoming sooncoming soonuint32Taxids of children.
counts repeatedTaxonomyNode.CountByType
genomicMoltypecoming sooncoming soonstringGenomic molecule type (dsDNA, ssDNA, ssDNA(-), ssRNA)

Classification Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
superkingdomTaxData
kingdomTaxData
phylumTaxData
classTaxData
orderTaxData
familyTaxData
genusTaxData
speciesTaxData

NameAndAuthority Structure

Name and authority object.

Contains information on the taxonomic node’s name, authority, publications, basionym, synonyms, etc.

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
namecoming sooncoming soonstringThis could be the scientific name, common name, synonym, etc. depending on the context.
authoritycoming sooncoming soonstringThe authority that this name was created by. The authority is typically representedby the author(s) name and the year in which it was published.
typeStrains repeatedTaxonomyTypeMaterialAny type materials for this entry.
curatorSynonymcoming sooncoming soonstringThe primary synonym of the scientific name.Leptosphaeria maculans
homotypicSynonyms repeatedNameAndAuthorityNames generated after the basionym (e.g. by moving it to a different genus), but sharing the same type. Usually these are the results of genus changes. Also known as objective synonym, nomenclatural synonym.Candida versatilis (Etchells & T.A. Bell) S.A. Mey. & Yarrow, 1978
heterotypicSynonyms repeatedNameAndAuthorityList of heterotypic synonyms associated with this entry.
otherSynonyms repeatedNameAndAuthorityList of other (not listed as heterotypic or homotypic) synonyms associated with this entry.
informalNames repeatedcoming sooncoming soonstringList of informal names for the entry.cow, spider
basionymNameAndAuthorityThe originally described name, no longer in use. Attached to the type material and species description.Brettanomyces versatilis Etchells & T.A. Bell, 1950
publications repeatedNameAndAuthority.PublicationContains a list of publication objects related to this species.
notes repeatedNameAndAuthority.NoteContains a list of note objects related to this species.
formalcoming sooncoming soonboolIndicates whether the name is formal (i.e. compliant)

NameAndAuthority.Note Structure

Note object

Contains information related to this specific entry.

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
namecoming sooncoming soonstringName of the notation.
notecoming sooncoming soonstringNote text.
noteClassifierNameAndAuthority.NoteClassifierNote classification

NameAndAuthority.Publication Structure

Publication object

Contains information about the publication such as the name and the citation.

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
namecoming sooncoming soonstringName of the publication (article, book, etc.).
citationcoming sooncoming soonstringCitation to the publication.

TaxData Structure

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
namecoming sooncoming soonstringTaxonomic name
idcoming sooncoming soonuint32NCBI Taxonomy identifier

TaxonomyNode.CountByType Structure

Count of various attributes, summed up for above species ranks.

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
typeCountType
countcoming sooncoming soonuint32

TaxonomyTypeMaterial Structure

Type Material object.

Metadata pertaining to the original voucher used to describe the species.

FieldTable Field MnemonicTable Column NameTypeDescriptionExamples
typeStrainNamecoming sooncoming soonstringThe strain name of the type material.ATCC:43971
typeStrainIdcoming sooncoming soonstringThe strain ID of the type material.ATCC
bioCollectionIdcoming sooncoming soonstringThe biocollection ID of the type material.4278
bioCollectionNamecoming sooncoming soonstringThe biocollection name of the type material.ATCC
collectionType repeatedCollectionTypeType of collection for the type material.collection_culture_collection
typeClasscoming sooncoming soonstringType material classification.type strain

CollectionType Enumeration

NameNumberDescription
no_collection_type0
collection_culture_collection1
specimen_voucher2

CountType Enumeration

NameNumberDescription
COUNT_TYPE_UNSPECIFIED0
COUNT_TYPE_ASSEMBLY1
COUNT_TYPE_GENE2
COUNT_TYPE_tRNA3
COUNT_TYPE_rRNA4
COUNT_TYPE_snRNA5
COUNT_TYPE_scRNA6
COUNT_TYPE_snoRNA7
COUNT_TYPE_PROTEIN_CODING8
COUNT_TYPE_PSEUDO9
COUNT_TYPE_TRANSPOSON10
COUNT_TYPE_miscRNA11
COUNT_TYPE_ncRNA12
COUNT_TYPE_BIOLOGICAL_REGION13
COUNT_TYPE_OTHER14

NameAndAuthority.NoteClassifier Enumeration

Class of authority

If the authority has any special classification, such as having been effectively and validly published or having been included in an approved list.

NameNumberDescription
no_authority_classifier0No specific classification.
effective_name1Has been effectively and validly published (i.e. in the “International Code of Nonemclature of Prokaryotes”).
nomen_approbbatum2Has been included in an approved list (such as the “Approved List of Bacterial Names”).
ictv_accepted3Has been ICTV accepted

RankType Enumeration

Rank level

NameNumberDescription
NO_RANK0
SUPERKINGDOM1
KINGDOM2
SUBKINGDOM3
SUPERPHYLUM4
SUBPHYLUM5
PHYLUM6
CLADE31
SUPERCLASS7
CLASS8
SUBCLASS9
INFRACLASS10
COHORT11
SUBCOHORT12
SUPERORDER13
ORDER14
SUBORDER15
INFRAORDER16
PARVORDER17
SUPERFAMILY18
FAMILY19
SUBFAMILY20
GENUS21
SUBGENUS22
SPECIES_GROUP23
SPECIES_SUBGROUP24
SPECIES25
SUBSPECIES26
TRIBE27
SUBTRIBE28
FORMA29
VARIETAS30
STRAIN320
SECTION330
SUBSECTION340
PATHOGROUP350
SUBVARIETY360
GENOTYPE370
SEROTYPE380
ISOLATE390
MORPH400
SERIES410
FORMA_SPECIALIS420
SEROGROUP430
BIOTYPE440

Scalar Value Types

Protocol buffers typeNotesC++PythonJavaGo
doubledoublefloatdoublefloat64
floatfloatfloatfloatfloat32
int32Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.int32intintint32
int64Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.int64int/longlongint64
uint32Uses variable-length encoding.uint32int/longintuint32
uint64Uses variable-length encoding.uint64int/longlonguint64
sint32Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.int32intintint32
sint64Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.int64int/longlongint64
fixed32Always four bytes. More efficient than uint32 if values are often greater than 2^28.uint32intintuint32
fixed64Always eight bytes. More efficient than uint64 if values are often greater than 2^56.uint64int/longlonguint64
sfixed32Always four bytes.int32intintint32
sfixed64Always eight bytes.int64int/longlongint64
boolboolbooleanbooleanbool
stringA string must always contain UTF-8 encoded or 7-bit ASCII text.stringstr/unicodeStringstring
bytesMay contain any arbitrary sequence of bytes.stringstrByteString[]byte
Generated May 1, 2024