Skip to main page content Skip to main page content

BioC API for PMC

PubMed Central Open Access in BioC format (click here for accessing PubMed articles)

All the PubMed Central (PMC) Open Access articles are available in the BioC format. This provides a large number of full text research articles for text mining and information retrieval research. BioC is a simple format designed for straightforward text processing. These articles are available in BioC XML or BioC JSON, in Unicode or ASCII, and via PubMed ID or PMC ID.

If you use this resource, please cite:

Articles available from this service are in the PMC Open Access Subset and the PMC Author Manuscript Collection. Information about these collections is available on the following pages.

Not all PMC articles are available in these collections. Lists of articles in the collections are available via FTP.

These files are also available in the CSV format. A description of the FTP Service is available from: https://www.ncbi.nlm.nih.gov/pmc/tools/ftp/.

Articles in the BioC API for PMC are usually updated within 24 hours of these files being updated.

Instructions

https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_[format]/[ID]/[encoding]
The parameters are:
  • format: xml or json
  • ID: PubMed ID (such as 17299597) or PMC ID (such as PMC1790863)
  • encoding: unicode or ascii

Sample URL:
https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_xml/17299597/unicode

Same article in ASCII:
https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_xml/17299597/ascii
Obviously, no Unicode to ASCII translation is perfect. We have found this one useful.

JSON instead of XML:
https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/17299597/unicode
BioC JSON follows the same structure as BioC XML.

Using PMC ID instead of PubMed ID:
https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_xml/PMC1790863/unicode

Bulk Download

BioC PMC articles can be downloaded in bulk from the FTP site:
https://ftp.ncbi.nlm.nih.gov/pub/wilbur/BioC-PMC

More information

General information about BioC XML structure:
ftp://ftp.ncbi.nlm.nih.gov/pub/wilbur/BioC-PMC/BioC.dtd

Specific information about BioC-PMC:
ftp://ftp.ncbi.nlm.nih.gov/pub/wilbur/BioC-PMC/pmc.key

Main BioC web page:
http://bioc.sourceforge.net

Caution

If you experience any problems, please share them with us: donald.comeau@nih.gov or zhiyong.lu@nih.gov.