Entrez Direct: E-utilities on the Unix Command Line

unique_api_key · Estimated reading time: 1 hour

  sh -c "$(curl -fsSL https://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/install-edirect.sh)"

  sh -c "$(wget -q https://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/install-edirect.sh -O -)"
  echo "export PATH=\$HOME/edirect:\$PATH" >> $HOME/.bash_profile
  export PATH=${HOME}/edirect:${PATH}
  export NCBI_API_KEY=unique_api_key
  esearch -db pubmed -query "selective serotonin reuptake inhibitor"
  esearch -db nuccore -query "insulin [PROT] AND rodents [ORGN]"
  elink -related

  elink -target gene
  elink -cited

  elink -cites
  efilter -molecule genomic -location chloroplast -country sweden -mindate 1985
  efetch -format abstract
  esearch -db pubmed -query "tn3 transposition immunity" | efetch -format medline
  esearch -db pubmed -query "opsin gene conversion" | \
  elink -related | \
  elink -target protein | \
  efilter -division rod | \
  efetch -format fasta
  nquire -get https://icite.od.nih.gov api/pubs -pmids 2539356 |
  transmute -j2x |
  xtract -pattern data -element cited_by |
  fmt -w 1 | sort -V | uniq
  esearch -db pubmed -query "lycopene cyclase" | elink -related |
  elink -target protein |
  efilter -organism mouse -source refseq |
  efetch -format fasta
  ...
  >NP_067461.2 beta,beta-carotene 15,15'-dioxygenase isoform 1 [Mus musculus]
  MEIIFGQNKKEQLEPVQAKVTGSIPAWLQGTLLRNGPGMHTVGESKYNHWFDGLALLHSFSIRDGEVFYR
  SKYLQSDTYIANIEANRIVVSEFGTMAYPDPCKNIFSKAFSYLSHTIPDFTDNCLINIMKCGEDFYATTE
  TNYIRKIDPQTLETLEKVDYRKYVAVNLATSHPHYDEAGNVLNMGTSVVDKGRTKYVIFKIPATVPDSKK
  ...
  esearch -db pubmed -query "lycopene cyclase" |
  efetch -format abstract
  ...
  85. PLoS One. 2013;8(3):e58144. doi: 10.1371/journal.pone.0058144. Epub ...

  Levels of lycopene β-cyclase 1 modulate carotenoid gene expression and
  accumulation in Daucus carota.

  Moreno JC(1), Pizarro L, Fuentes P, Handford M, Cifuentes V, Stange C.

  Author information:
  (1)Departamento de Biología, Facultad de Ciencias, Universidad de Chile,
  Santiago, Chile.

  Plant carotenoids are synthesized and accumulated in plastids through a
  highly regulated pathway. Lycopene β-cyclase (LCYB) is a key enzyme
  involved directly in the synthesis of α-carotene and β-carotene through
  ...
  esearch -db pubmed -query "lycopene cyclase" |
  efetch -format medline
  ...
  PMID- 23555569
  OWN - NLM
  STAT- MEDLINE
  DA  - 20130404
  DCOM- 20130930
  LR  - 20131121
  IS  - 1932-6203 (Electronic)
  IS  - 1932-6203 (Linking)
  VI  - 8
  IP  - 3
  DP  - 2013
  TI  - Levels of lycopene beta-cyclase 1 modulate carotenoid gene expression
        and accumulation in Daucus carota.
  PG  - e58144
  LID - 10.1371/journal.pone.0058144 [doi]
  AB  - Plant carotenoids are synthesized and accumulated in plastids
        through a highly regulated pathway. Lycopene beta-cyclase (LCYB) is a
        key enzyme involved directly in the synthesis of alpha-carotene and
        ...
  esearch -db protein -query "lycopene cyclase" |
  efetch -format fasta
  ...
  >gi|735882|gb|AAA81880.1| lycopene cyclase [Arabidopsis thaliana]
  MDTLLKTPNKLDFFIPQFHGFERLCSNNPYPSRVRLGVKKRAIKIVSSVVSGSAALLDLVPETKKENLDF
  ELPLYDTSKSQVVDLAIVGGGPAGLAVAQQVSEAGLSVCSIDPSPKLIWPNNYGVWVDEFEAMDLLDCLD
  TTWSGAVVYVDEGVKKDLSRPYGRVNRKQLKSKMLQKCITNGVKFHQSKVTNVVHEEANSTVVCSDGVKI
  QASVVLDATGFSRCLVQYDKPYNPGYQVAYGIIAEVDGHPFDVDKMVFMDWRDKHLDSYPELKERNSKIP
  TFLYAMPFSSNRIFLEETSLVARPGLRMEDIQERMAARLKHLGINVKRIEEDERCVIPMGGPLPVLPQRV
  VGIGGTAGMVHPSTGYMVARTLAAAPIVANAIVRYLGSPSSNSLRGDQLSAEVWRDLWPIERRRQREFFC
  FGMDILLKLDLDATRRFFDAFFDLQPHYWHGFLSSRLFLPELLVFGLSLFSHASNTSRLEIMTKGTVPLA
  KMINNLVQDRD
  ...
  esearch -db protein -query "lycopene cyclase" |
  efetch -format gp
  ...
  LOCUS       AAA81880                 501 aa            linear   PLN ...
  DEFINITION  lycopene cyclase [Arabidopsis thaliana].
  ACCESSION   AAA81880
  VERSION     AAA81880.1  GI:735882
  DBSOURCE    locus ATHLYC accession L40176.1
  KEYWORDS    .
  SOURCE      Arabidopsis thaliana (thale cress)
    ORGANISM  Arabidopsis thaliana
              Eukaryota; Viridiplantae; Streptophyta; Embryophyta;
              Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons;
              Brassicales; Brassicaceae; Camelineae; Arabidopsis.
  REFERENCE   1  (residues 1 to 501)
    AUTHORS   Scolnik,P.A. and Bartley,G.E.
    TITLE     Nucleotide sequence of lycopene cyclase (GenBank L40176) from
              Arabidopsis (PGR95-019)
    JOURNAL   Plant Physiol. 108 (3), 1343 (1995)
  ...
  FEATURES             Location/Qualifiers
       source          1..501
                       /organism="Arabidopsis thaliana"
                       /db_xref="taxon:3702"
       Protein         1..501
                       /product="lycopene cyclase"
       transit_peptide 1..80
       mat_peptide     81..501
                       /product="lycopene cyclase"
       CDS             1..501
                       /gene="LYC"
                       /coded_by="L40176.1:2..1507"
  ORIGIN
          1 mdtllktpnk ldffipqfhg ferlcsnnpy psrvrlgvkk raikivssvv sgsaalldlv
         61 petkkenldf elplydtsks qvvdlaivgg gpaglavaqq vseaglsvcs idpspkliwp
        121 nnygvwvdef eamdlldcld ttwsgavvyv degvkkdlsr pygrvnrkql kskmlqkcit
        181 ngvkfhqskv tnvvheeans tvvcsdgvki qasvvldatg fsrclvqydk pynpgyqvay
        241 giiaevdghp fdvdkmvfmd wrdkhldsyp elkernskip tflyampfss nrifleetsl
        301 varpglrmed iqermaarlk hlginvkrie edercvipmg gplpvlpqrv vgiggtagmv
        361 hpstgymvar tlaaapivan aivrylgsps snslrgdqls aevwrdlwpi errrqreffc
        421 fgmdillkld ldatrrffda ffdlqphywh gflssrlflp ellvfglslf shasntsrle
        481 imtkgtvpla kminnlvqdr d
  //
  ...
  esearch -db pubmed -query "opsin gene conversion" |
  elink -related |
  efilter -query "tetrachromacy"
  efilter -days 60 -datetype PDAT

  efilter -mindate 2000

  efilter -maxdate 1985

  efilter -mindate 1990 -maxdate 1999
  efetch -db pubmed -id 7252148,1937004 -format xml

  efetch -db nuccore -id 1121073309 -format acc

  efetch -db protein -id 3OQZ_a -format fasta

  efetch -db bioproject -id PRJNA257197 -format docsum

  efetch -db pmc -id PMC209839 -format medline

  elink -db pubmed -id 2539356 -cites
  esearch -db bioproject -query "PRJNA257197 [PRJA]" |
  efetch -format uid | ...
  annotinfo      [ASAC]
  assembly       [ASAC]
  bioproject     [PRJA]
  books          [AID]
  clinvar        [VACC]
  gds            [ALL]
  genome         [PRJA]
  geoprofiles    [NAME]
  gtr            [GTRACC]
  mesh           [MHUI]
  nuccore        [ACCN] or [PACC]
  pcsubstance    [SRID]
  snp            [RS] or [SS]
  esummary -db bioproject -id PRJNA257197

  esummary -db sra -id SRR5437876
  cat "file_of_identifiers.txt" |
  efetch -db pubmed -format docsum
  efetch -input "file_of_identifiers.txt" -db pubmed -format docsum
  einfo -db protein -fields
  ACCN    Accession
  ALL     All Fields
  ASSM    Assembly
  AUTH    Author
  BRD     Breed
  CULT    Cultivar
  DIV     Division
  ECNO    EC/RN Number
  FILT    Filter
  FKEY    Feature key
  ...
  [AFFL]    Affiliation           [LANG]    Language
  [ALL]     All Fields            [MAJR]    MeSH Major Topic
  [AUTH]    Author                [SUBH]    MeSH Subheading
  [FAUT]    Author - First        [MESH]    MeSH Terms
  [LAUT]    Author - Last         [PTYP]    Publication Type
  [CRDT]    Date - Create         [WORD]    Text Word
  [PDAT]    Date - Publication    [TITL]    Title
  [FILT]    Filter                [TIAB]    Title/Abstract
  [JOUR]    Journal               [UID]     UID
  "Tager HS [AUTH] AND glucagon [TIAB]"
  humans [MESH]
  pharmacokinetics [MESH]
  chemically induced [SUBH]
  all child [FILT]
  english [FILT]
  freetext [FILT]
  has abstract [FILT]
  historical article [FILT]
  randomized controlled trial [FILT]
  clinical trial, phase ii [PTYP]
  review [PTYP]
  [ACCN]    Accession       [MLWT]    Molecular Weight
  [ALL]     All Fields      [ORGN]    Organism
  [AUTH]    Author          [PACC]    Primary Accession
  [GPRJ]    BioProject      [PROP]    Properties
  [BIOS]    BioSample       [PROT]    Protein Name
  [ECNO]    EC/RN Number    [SQID]    SeqID String
  [FKEY]    Feature key     [SLEN]    Sequence Length
  [FILT]    Filter          [SUBS]    Substance Name
  [GENE]    Gene Name       [WORD]    Text Word
  [JOUR]    Journal         [TITL]    Title
  [KYWD]    Keyword         [UID]     UID
  "alcohol dehydrogenase [PROT] NOT (bacteria [ORGN] OR fungi [ORGN])"
  mammalia [ORGN]
  mammalia [ORGN:noexp]
  txid40674 [ORGN]
  cds [FKEY]
  lacz [GENE]
  beta galactosidase [PROT]
  protein snp [FILT]
  reviewed [FILT]
  country united kingdom glasgow [TEXT]
  biomol genomic [PROP]
  dbxref flybase [PROP]
  gbdiv phg [PROP]
  phylogenetic study [PROP]
  sequence from mitochondrion [PROP]
  src cultivar [PROP]
  srcdb refseq validated [PROP]
  150:200 [SLEN]
  esearch -db protein -query "tryptophan synthase alpha chain [PROT]" |
  efilter -query "28000:30000 [MLWT]" |
  elink -target structure |
  efilter -query "0:2 [RESO]"
  <ENTREZ_DIRECT>
    <Db>structure</Db>
    <WebEnv> MCID_5fac27e119f45d4eca20b0e6</WebEnv>
    <QueryKey>32</QueryKey>
    <Count>58</Count>
    <Step>4</Step>
  </ENTREZ_DIRECT>
  esearch -db protein -query "amyloid* [PROT]" |
  elink -target pubmed -label prot_cit |
  esearch -db gene -query "apo* [GENE]" |
  elink -target pubmed -label gene_cit |
  esearch -query "(#prot_cit) AND (#gene_cit)" |
  efetch -format docsum |
  xtract -pattern DocumentSummary -element Id Title
  23962925    Genome analysis reveals insights into physiology and ...
  23959870    Low levels of copper disrupt brain amyloid-β homeostasis ...
  23371554    Genomic diversity and evolution of the head crest in the ...
  23251661    Novel genetic loci identified for the pathophysiology of ...
  ...
  <PubDate>2013</PubDate>
  <Source>PLoS One</Source>
  <Volume>8</Volume>
  <Issue>3</Issue>
  <Pages>e58144</Pages>
  1. PLoS One. 2013;8(3):e58144 ...
  efetch -db pubmed -id 6271474,6092233,16589597 -format docsum |
  xtract -pattern DocumentSummary -sep "|" -element Id PubDate Name
  6271474     1981            Casadaban MJ|Chou J|Lemaux P|Tu CP|Cohen SN
  6092233     1984 Jul-Aug    Calderon IL|Contopoulou CR|Mortimer RK
  16589597    1954 Dec        Garber ED
  -sep " " -element Initials,LastName
  -def "-" -sep " " -element Year,Month,MedlineDate
  -pfx "<Word>" -sep "</Word><Word>" -sfx "</Word>"
  -tag Item -att type journal -cls -element Source -end Item \
  -deq "\n" -tag Item -att type journal -atr name Source -slf
  <Item type="journal">J Bacteriol</Item>
  <Item type="journal" name="J Bacteriol" />
  Positional:       -first, -last, -even, -odd, -backward

  Numeric:          -num, -len, -inc, -dec, -bin, -hex, -bit

  Statistics:       -sum, -acc, -min, -max, -dev, -med

  Averages:         -avg, -geo, -hrm, -rms

  Logarithms:       -lge, -lg2, -log

  Character:        -encode, -upper, -title, -mirror, -alnum

  String:           -basic, -plain, -simple, -author, -journal, -prose

  Text:             -words, -pairs, -letters, -order, -reverse

  Citation:         -year, -month, -date, -page, -auth

  Sequence:         -revcomp, -fasta, -ncbi2na, -molwt, -pentamers

  Translation:      -cds2prot, -gcode, -frame

  Coordinate:       -0-based, -1-based, -ucsc-based

  Variation:        -hgvs

  Frequency:        -histogram

  Expression:       -reg, -exp, -replace

  Substitution:     -transform, -translate

  Indexing:         -aliases, -classify

  Miscellaneous:    -doi, -wct, -trim, -pad, -accession, -numeric
  efetch -db pubmed -id 1413997 -format xml |
  <Author>
    <LastName>Mortimer</LastName>
    <Initials>RK</Initials>
  </Author>
  efetch -db pubmed -id 1413997 -format xml |
  xtract -pattern PubmedArticle -element Initials LastName
  RK    CR    JS    Mortimer    Contopoulou    King
  efetch -db pubmed -id 1413997 -format xml |
  xtract -pattern PubmedArticle -block Author -element Initials LastName
  RK    Mortimer    CR    Contopoulou    JS    King
  efetch -db pubmed -id 1413997 -format xml |
  xtract -pattern PubmedArticle -block Author \
    -sep " " -tab ", " -element Initials,LastName
  RK Mortimer, CR Contopoulou, JS King
  efetch -db pubmed -id 6092233,4640931,4296474 -format xml |
  xtract -pattern PubmedArticle -element MedlineCitation/PMID \
    -block PubDate -sep " " -element Year,Month,MedlineDate \
    -block AuthorList -num Author -sep "/" -element LastName |
  sort-table -k 3,3n -k 4,4f
  4296474    1968 Apr        1    Friedmann
  4640931    1972 Dec        2    Tager/Steiner
  6092233    1984 Jul-Aug    3    Calderon/Contopoulou/Mortimer
  <PubDate>
    <Year>1968</Year>
    <Month>Apr</Month>
    <Day>25</Day>
  </PubDate>
  <PubDate>
    <MedlineDate>1984 Jul-Aug</MedlineDate>
  </PubDate>
  -element Year,Month,MedlineDate
  -pattern > -group > -block > -subset > -element
  -pattern State -group City -block Street -subset Number -element Resident
  efetch -db nuccore -id NG_008030.1 -format gbc |
  xtract -pattern INSDSeq -element INSDSeq_accession-version \
    -group INSDFeature -deq "\n\t" -element INSDFeature_key \
      -block INSDQualifier -deq "\n\t\t" \
        -element INSDQualifier_name INSDQualifier_value
  NG_008030.1
      source
                organism         Homo sapiens
                mol_type         genomic DNA
                db_xref          taxon:9606
      gene
                gene             COL5A1
      mRNA
                gene             COL5A1
                product          collagen type V alpha 1 chain, transcript variant 1
                transcript_id    NM_000093.4
      CDS
                gene             COL5A1
                product          collagen alpha-1(V) chain isoform 1 preproprotein
                protein_id       NP_000084.3
                translation      MDVHTRWKARSALRPGAPLLPPLLLLLLWAPPPSRAAQP...
                ...
  efetch -db nuccore -id NG_008030.1 -format gbc |
  xtract -pattern INSDSeq -element INSDSeq_accession-version \
    -group INSDFeature -KEY INSDFeature_key \
      -block INSDQualifier -deq "\n\t" \
        -element "&KEY" INSDQualifier_name INSDQualifier_value
  NG_008030.1
      source    organism         Homo sapiens
      source    mol_type         genomic DNA
      source    db_xref          taxon:9606
      gene      gene             COL5A1
      mRNA      gene             COL5A1
      mRNA      product          collagen type V alpha 1 chain, transcript variant 1
      mRNA      transcript_id    NM_000093.4
      CDS       gene             COL5A1
      CDS       product          collagen alpha-1(V) chain isoform 1 preproprotein
      CDS       protein_id       NP_000084.3
      CDS       translation      MDVHTRWKARSALRPGAPLLPPLLLLLLWAPPPSRAAQP...
      ...
  -block Author -sep " " -tab "" -element "&COM" Initials,LastName -COM "(, )"
  -CHR Chromosome -block GenomicInfoType -if "&CHR" -differs-from ChrLoc
  -END -sum "Start,Length" -MID -avg "Start,&END"
  esearch -db pubmed -query "Havran W [AUTH]" |
  efetch -format xml |
  xtract -pattern PubmedArticle -if "#Author" -lt 14 \
    -block Author -if LastName -is-not Havran \
      -sep ", " -tab "\n" -element LastName,Initials[1:1] |
  sort-uniq-count-rank
  34    Witherden, D
  15    Boismenu, R
  12    Jameson, J
  10    Allison, J
  10    Fitch, F
  ...
  -if ChrStart -gt ChrStop
  -if Chromosome -differs-from ChrLoc
  -block Author -position last -sep ", " -element LastName,Initials
  -if @score -equals 1 -or @score -starts-with 0.9
  -if MapLocation -element MapLocation -else -lbl "\-"
  esearch -db pubmed -query "Beadle GW [AUTH]" |
  elink -cited |
  efetch -format docsum |
  xtract -pattern Author -element Name |
  sort -f | uniq -i -c
  1 Abellan-Schneyder I
  1 Abramowitz M
  1 ABREU LA
  1 ABREU RR
  1 Abril JF
  1 Abächerli E
  1 Achetib N
  1 Adams CM
  2 ADELBERG EA
  1 Adrian AB
  ...
  SortUniqCountRank() {
    grep '.' |
    sort -f |
    uniq -i -c |
    awk '{ n=$1; sub(/[ \t]*[0-9]+[ \t]/, ""); print n "\t" $0 }' |
    sort -t "$(printf '\t')" -k 1,1nr -k 2f
  }
  alias sort-uniq-count-rank='SortUniqCountRank'
  esearch -db pubmed -query "Beadle GW [AUTH]" |
  elink -cited |
  efetch -format docsum |
  xtract -pattern Author -element Name |
  sort-uniq-count-rank
  17    Hawley RS
  13    Beadle GW
  13    PERKINS DD
  11    Glass NL
  11    Vécsei L
  10    Toldi J
  9     TATUM EL
  8     Ephrussi B
  8     LEDERBERG J
  ...
  PubmedArticle
    MedlineCitation
      PMID
      DateCompleted
        Year
        Month
        Day
      ...
      Article
        Journal
          ...
          Title
          ISOAbbreviation
        ArticleTitle
        ...
        Abstract
          AbstractText
        AuthorList
          Author
            LastName
            ForeName
            Initials
            AffiliationInfo
              Affiliation
          Author
            ...
  xtract -pattern PubmedArticle \
    -block Author -element Initials,LastName \
    -block MeshHeading \
      -if QualifierName \
        -element DescriptorName \
        -subset QualifierName -element QualifierName
  for pat = each PubmedArticle {
    for blk = each pat.Author {
      print blk.Initials blk.LastName
    }
    for blk = each pat.MeSHTerm {
      if blk.Qual is present {
        print blk.MeshName
        for sbs = each blk.Qual {
          print sbs.QualName
        }
      }
    }
  }
  -path
    -division
      -group
        -branch
          -block
            -section
              -subset
                -unit
  <PubmedArticle>
    <MedlineCitation>
      <PMID>99999999</PMID>
      <Article>
        <AuthorList>
          <Author>
            <LastName>Tinker</LastName>
          </Author>
          <Author>
            <LastName>Evers</LastName>
          </Author>
          <Author>
            <LastName>Chance</LastName>
          </Author>
          <Author>
            <CollectiveName>FlyBase Consortium</CollectiveName>
          </Author>
        </AuthorList>
      </Article>
      <InvestigatorList>
        <Investigator>
          <LastName>Alpher</LastName>
        </Investigator>
        <Investigator>
          <LastName>Bethe</LastName>
        </Investigator>
        <Investigator>
          <LastName>Gamow</LastName>
        </Investigator>
      </InvestigatorList>
    </MedlineCitation>
  </PubmedArticle>
  xtract -pattern PubmedArticle -element LastName
  Tinker    Evers    Chance    Alpher    Bethe    Gamow
  xtract -pattern PubmedArticle -block AuthorList -element LastName
  Tinker    Evers    Chance
  xtract -pattern PubmedArticle -num Author Investigator LastName CollectiveName
  4    3    6    1
  <PubmedArticle>
    <MedlineCitation>
      <PMID>99999999</PMID>
      <DateCompleted>
        <Year>2011</Year>
      </DateCompleted>
      <DateRevised>
        <Year>2012</Year>
      </DateRevised>
      <Article>
        <Journal>
          <JournalIssue>
            <PubDate>
              <Year>2013</Year>
            </PubDate>
          </JournalIssue>
        </Journal>
        <ArticleDate>
          <Year>2014</Year>
        </ArticleDate>
      </Article>
    </MedlineCitation>
    <PubmedData>
      <History>
        <PubMedPubDate PubStatus="received">
          <Year>2015</Year>
        </PubMedPubDate>
        <PubMedPubDate PubStatus="accepted">
          <Year>2016</Year>
        </PubMedPubDate>
        <PubMedPubDate PubStatus="entrez">
          <Year>2017</Year>
        </PubMedPubDate>
        <PubMedPubDate PubStatus="pubmed">
          <Year>2018</Year>
        </PubMedPubDate>
        <PubMedPubDate PubStatus="medline">
          <Year>2019</Year>
        </PubMedPubDate>
      </History>
    </PubmedData>
  </PubmedArticle>
  xtract -pattern PubmedArticle -element Year
  2011    2012    2013    2014    2015    2016    2017    2018    2019
  xtract -pattern PubmedArticle -block History -element Year
  2015    2016    2017    2018    2019
  xtract -pattern PubmedArticle -block History \
    -if @PubStatus -equals "pubmed" -element Year
  2015    2016    2017    2018    2019
  xtract -pattern PubmedArticle -block PubMedPubDate \
    -if @PubStatus -equals "pubmed" -element Year
  2018
  <PubmedArticle>
    <MedlineCitation>
      <PMID>99999999</PMID>
      <CommentsCorrectionsList>
        <CommentsCorrections RefType="ErratumFor">
          <PMID>88888888</PMID>
        </CommentsCorrections>
      </CommentsCorrectionsList>
    </MedlineCitation>
  </PubmedArticle>
  xtract -pattern PubmedArticle -element PMID
  99999999    88888888
  PubmedArticle/MedlineCitation
  PubmedArticle/MedlineCitation/CommentsCorrectionList/CommentsCorrections
  xtract -pattern PubmedArticle -element MedlineCitation/PMID
  efetch -db pubmed -id 21433338,17247418 -format xml
  <PubmedArticleSet>
    <PubmedBookArticle>
      <BookDocument>
      ...
      </PubmedBookData>
    </PubmedBookArticle>
    <PubmedArticle>
      <MedlineCitation>
      ...
      </PubmedData>
    </PubmedArticle>
  </PubmedArticleSet>
  xtract -pattern "PubmedArticleSet/*" -element "*"
  <PubmedBookArticle><BookDocument> ... </PubmedBookData></PubmedBookArticle>
  <PubmedArticle><MedlineCitation> ... </PubmedData></PubmedArticle>
  efetch -db pubmed -id 21433338,17247418 -format xml |
  xtract -pattern "PubmedArticleSet/*" \
    -group "BookDocument/AuthorList" -tab "\n" -element LastName \
    -group "Book/AuthorList" -tab "\n" -element LastName \
    -group "Article/AuthorList" -tab "\n" -element LastName
  Fauci        Desrosiers
  Coffin       Hughes        Varmus
  Lederberg    Cavalli       Lederberg
  -group BookDocument -block AuthorList -element LastName
  Coffin    Hughes    Varmus    Fauci    Desrosiers
  efetch -db taxonomy -id 9606,7227 -format xml
    <TaxaSet>
1     <Taxon>
        <TaxId>9606</TaxId>
        <ScientificName>Homo sapiens</ScientificName>
        ...
        <LineageEx>
2         <Taxon>
            <TaxId>131567</TaxId>
            <ScientificName>cellular organisms</ScientificName>
            <Rank>no rank</Rank>
3         </Taxon>
4         <Taxon>
            <TaxId>2759</TaxId>
            <ScientificName>Eukaryota</ScientificName>
            <Rank>superkingdom</Rank>
5         </Taxon>
          ...
        </LineageEx>
        ...
6     </Taxon>
7     <Taxon>
        <TaxId>7227</TaxId>
        <ScientificName>Drosophila melanogaster</ScientificName>
        ...
8     </Taxon>
    </TaxaSet>
  efetch -db taxonomy -id 9606,7227,10090 -format xml |
  xtract -pattern Taxon \
    -element TaxId ScientificName GenbankCommonName Division
  9606     Homo sapiens               human          Primates
  7227     Drosophila melanogaster    fruit fly      Invertebrates
  10090    Mus musculus               house mouse    Rodents
  efetch -db taxonomy -id 9606,7227,10090 -format xml |
  xtract -pattern Taxon -block "*/Taxon" \
    -tab "\n" -element TaxId,ScientificName
  131567    cellular organisms
  2759      Eukaryota
  33154     Opisthokonta
  ...
  esearch -db gene -query "DMD [GENE] AND human [ORGN]" |
  efetch -format xml |
  xtract -pattern Entrezgene -block "**/Gene-commentary" \
    -tab "\n" -element Gene-commentary_type@value,Gene-commentary_accession
  genomic    NC_000023
  mRNA       XM_006724469
  peptide    XP_006724532
  mRNA       XM_011545467
  peptide    XP_011543769
  ...
  esearch -db pubmed -query "Hoffmann PC [AUTH] AND dopamine [MAJR]" |
  elink -related -cmd neighbor |
  xtract -pattern LinkSetDb -element Id
  1504781    11754494    3815119    1684029    14614914    12128255    ...
  1684029    3815119     1504781    8097798    17161385    14755628    ...
  2572612    2903614     6152036    2905789    9483560     1352865     ...
  ...
  esearch -db pubmed -query "Federhen S [AUTH]" |
  elink -cmd acheck |
  xtract -pattern LinkSet -tab "\n" -element IdLinkSet/Id \
    -block LinkInfo -tab "\n" -element LinkName
  25510495
  pubmed_images
  pubmed_pmc
  pubmed_pmc_local
  pubmed_pmc_refs
  pubmed_pubmed
  pubmed_pubmed_citedin
  ...
  epost -db pubmed -id 22966225,19880848 |
  efetch -format uid |
  xargs -n 1 elink -db pubmed -cmd prlinks -id |
  xtract -pattern LinkSet -first Id -element ObjUrl/Url
  efetch -db pubmed -id 2539356 -format xml |
  xtract -stops -rec Rec -pattern PubmedArticle \
    -enc Paragraph -wrp Word -words AbstractText
  ...
  <Paragraph>
    <Word>the</Word>
    <Word>tn3</Word>
    <Word>transposon</Word>
    <Word>inserts</Word>
    ...
    <Word>was</Word>
    <Word>necessary</Word>
    <Word>for</Word>
    <Word>immunity</Word>
  </Paragraph>
  ...
  xtract -pattern Rec -block Paragraph -num Word
  efetch -db nuccore -id U00096,CP002956 -format docsum |
  <DocumentSummary>
    <Id>545778205</Id>
    <Caption>U00096</Caption>
    <Title>Escherichia coli str. K-12 substr. MG1655, complete genome</Title>
    <CreateDate>1998/10/13</CreateDate>
    <UpdateDate>2020/09/23</UpdateDate>
    <TaxId>511145</TaxId>
    <Slen>4641652</Slen>
    <Biomol>genomic</Biomol>
    <MolType>dna</MolType>
    <Topology>circular</Topology>
    <Genome>chromosome</Genome>
    <Completeness>complete</Completeness>
    <GeneticCode>11</GeneticCode>
    <Organism>Escherichia coli str. K-12 substr. MG1655</Organism>
    <Strain>K-12</Strain>
    <BioSample>SAMN02604091</BioSample>
    <Statistics>
      <Stat type="Length" count="4641652"/>
      <Stat type="all" count="9198"/>
      <Stat type="cdregion" count="4302"/>
      <Stat type="cdregion" subtype="CDS" count="4285"/>
      <Stat type="cdregion" subtype="CDS/pseudo" count="17"/>
      <Stat type="gene" count="4609"/>
      <Stat type="gene" subtype="Gene" count="4464"/>
      <Stat type="gene" subtype="Gene/pseudo" count="145"/>
      <Stat type="rna" count="187"/>
      <Stat type="rna" subtype="ncRNA" count="79"/>
      <Stat type="rna" subtype="rRNA" count="22"/>
      <Stat type="rna" subtype="tRNA" count="86"/>
      <Stat source="all" type="Length" count="4641652"/>
      <Stat source="all" type="all" count="13500"/>
      <Stat source="all" type="cdregion" count="4302"/>
      <Stat source="all" type="gene" count="4609"/>
      <Stat source="all" type="prot" count="4302"/>
      <Stat source="all" type="rna" count="187"/>
    </Statistics>
    <AccessionVersion>U00096.3</AccessionVersion>
  </DocumentSummary>
  <DocumentSummary>
    <Id>342852136</Id>
    <Caption>CP002956</Caption>
    <Title>Yersinia pestis A1122, complete genome</Title>
    ...
  xtract -set Set -rec Rec -pattern DocumentSummary \
    -block DocumentSummary -pkg Common \
      -wrp Accession -element AccessionVersion \
      -wrp Organism -element Organism \
      -wrp Length -element Slen \
      -wrp Title -element Title \
      -wrp Date -element CreateDate \
      -wrp Biomol -element Biomol \
      -wrp MolType -element MolType \
    -block Stat -if @type -equals gene -pkg Gene -element "*" \
    -block Stat -if @type -equals rna -pkg RNA -element "*" \
    -block Stat -if @type -equals cdregion -pkg CDS -element "*" |
  ...
  <Rec>
    <Common>
      <Accession>U00096.3</Accession>
      <Organism>Escherichia coli str. K-12 substr. MG1655</Organism>
      <Length>4641652</Length>
      <Title>Escherichia coli str. K-12 substr. MG1655, complete genome</Title>
      <Date>1998/10/13</Date>
      <Biomol>genomic</Biomol>
      <MolType>dna</MolType>
    </Common>
    <Gene>
      <Stat type="gene" count="4609"/>
      <Stat type="gene" subtype="Gene" count="4464"/>
      <Stat type="gene" subtype="Gene/pseudo" count="145"/>
      <Stat source="all" type="gene" count="4609"/>
    </Gene>
    <RNA>
      <Stat type="rna" count="187"/>
      <Stat type="rna" subtype="ncRNA" count="79"/>
      <Stat type="rna" subtype="rRNA" count="22"/>
      <Stat type="rna" subtype="tRNA" count="86"/>
      <Stat source="all" type="rna" count="187"/>
    </RNA>
    <CDS>
      <Stat type="cdregion" count="4302"/>
      <Stat type="cdregion" subtype="CDS" count="4285"/>
      <Stat type="cdregion" subtype="CDS/pseudo" count="17"/>
      <Stat source="all" type="cdregion" count="4302"/>
    </CDS>
  </Rec>
  ...
  xtract -set Set -rec Rec -pattern Rec \
    -block Common -element "*" \
    -block Gene -wrp GeneCount -first Stat@count \
    -block RNA -wrp RnaCount -first Stat@count \
    -block CDS -wrp CDSCount -first Stat@count |
  ...
  <Rec>
    <Common>
      <Accession>U00096.3</Accession>
      <Organism>Escherichia coli str. K-12 substr. MG1655</Organism>
      <Length>4641652</Length>
      <Title>Escherichia coli str. K-12 substr. MG1655, complete genome</Title>
      <Date>1998/10/13</Date>
      <Biomol>genomic</Biomol>
      <MolType>dna</MolType>
    </Common>
    <GeneCount>4609</GeneCount>
    <RnaCount>187</RnaCount>
    <CDSCount>4302</CDSCount>
  </Rec>
  ...
  xtract \
    -head accession organism length gene_count rna_count \
    -pattern Rec -def "-" \
      -element Accession Organism Length GeneCount RnaCount
  accession     organism                 length     gene_count    rna_count
  U00096.3      Escherichia coli ...     4641652    4609          187
  CP002956.1    Yersinia pestis A1122    4553770    4217          86
  reorder-columns 1 3 5 4
  accession     length     rna_count    gene_count
  U00096.3      4641652    187          4609
  CP002956.1    4553770    86           4217
  efetch -db protein -id 26418308,26418074 -format gpc
  <INSDSet>
    <INSDSeq>
      <INSDSeq_locus>AAN78128</INSDSeq_locus>
      <INSDSeq_length>17</INSDSeq_length>
      <INSDSeq_moltype>AA</INSDSeq_moltype>
      <INSDSeq_topology>linear</INSDSeq_topology>
      <INSDSeq_division>INV</INSDSeq_division>
      <INSDSeq_update-date>03-JAN-2003</INSDSeq_update-date>
      <INSDSeq_create-date>10-DEC-2002</INSDSeq_create-date>
      <INSDSeq_definition>alpha-conotoxin ImI precursor, partial [Conus
         imperialis]</INSDSeq_definition>
      <INSDSeq_primary-accession>AAN78128</INSDSeq_primary-accession>
      <INSDSeq_accession-version>AAN78128.1</INSDSeq_accession-version>
      <INSDSeq_other-seqids>
        <INSDSeqid>gb|AAN78128.1|</INSDSeqid>
        <INSDSeqid>gi|26418308</INSDSeqid>
      </INSDSeq_other-seqids>
      <INSDSeq_source>Conus imperialis</INSDSeq_source>
      <INSDSeq_organism>Conus imperialis</INSDSeq_organism>
      <INSDSeq_taxonomy>Eukaryota; Metazoa; Lophotrochozoa; Mollusca;
         Gastropoda; Caenogastropoda; Hypsogastropoda; Neogastropoda;
         Conoidea; Conidae; Conus</INSDSeq_taxonomy>
      <INSDSeq_references>
        <INSDReference>
        ...
  FEATURES             Location/Qualifiers
       source          1..17
                       /organism="Conus imperialis"
                       /db_xref="taxon:35631"
                       /country="Philippines"
       Protein         <1..17
                       /product="alpha-conotoxin ImI precursor"
       mat_peptide     5..16
                       /product="alpha-conotoxin ImI"
                       /note="the C-terminal glycine of the precursor is post
                       translationally removed"
                       /calculated_mol_wt=1357
                       /peptide="GCCSDPRCAWRC"
       CDS             1..17
                       /coded_by="AY159318.1:<1..54"
                       /note="nAChR antagonist"
  ...
  <INSDFeature>
    <INSDFeature_key>mat_peptide</INSDFeature_key>
    <INSDFeature_location>5..16</INSDFeature_location>
    <INSDFeature_intervals>
      <INSDInterval>
        <INSDInterval_from>5</INSDInterval_from>
        <INSDInterval_to>16</INSDInterval_to>
        <INSDInterval_accession>AAN78128.1</INSDInterval_accession>
      </INSDInterval>
    </INSDFeature_intervals>
    <INSDFeature_quals>
      <INSDQualifier>
        <INSDQualifier_name>product</INSDQualifier_name>
        <INSDQualifier_value>alpha-conotoxin ImI</INSDQualifier_value>
      </INSDQualifier>
      <INSDQualifier>
        <INSDQualifier_name>note</INSDQualifier_name>
        <INSDQualifier_value>the C-terminal glycine of the precursor is
           post translationally removed</INSDQualifier_value>
      </INSDQualifier>
      <INSDQualifier>
        <INSDQualifier_name>calculated_mol_wt</INSDQualifier_name>
        <INSDQualifier_value>1357</INSDQualifier_value>
      </INSDQualifier>
      <INSDQualifier>
        <INSDQualifier_name>peptide</INSDQualifier_name>
        <INSDQualifier_value>GCCSDPRCAWRC</INSDQualifier_value>
      </INSDQualifier>
    </INSDFeature_quals>
  </INSDFeature>
  ...
  xtract -insd complete mat_peptide product peptide
  xtract -pattern INSDSeq -ACCN INSDSeq_accession-version -SEQ INSDSeq_sequence \
    -group INSDFeature -if INSDFeature_key -equals mat_peptide \
      -branch INSDFeature -unless INSDFeature_partial5 -or INSDFeature_partial3 \
        -clr -pfx "\n" -element "&ACCN" \
  -block INSDQualifier \
    -if INSDQualifier_name -equals product \
      -element INSDQualifier_value
  esearch -db pubmed -query "conotoxin" |
  elink -target protein |
  efilter -query "mat_peptide [FKEY]" |
  efetch -format gpc |
  xtract -insd complete mat_peptide "%peptide" product mol_wt peptide |
  AAN78128.1    12    alpha-conotoxin ImI    1357    GCCSDPRCAWRC
  ADB65789.1    20    conotoxin Cal 16       2134    LEMQGCVCNANAKFCCGEGR
  ADB65788.1    20    conotoxin Cal 16       2134    LEMQGCVCNANAKFCCGEGR
  AGO59814.1    32    del13b conotoxin       3462    DCPTSCPTTCANGWECCKGYPCVRQHCSGCNH
  AAO33169.1    16    alpha-conotoxin GIC    1615    GCCSHPACAGNNQHIC
  AAN78279.1    21    conotoxin Vx-II        2252    WIDPSHYCCCGGGCTDDCVNC
  AAF23167.1    31    BeTX toxin             3433    CRAEGTYCENDSQCCLNECCWGGCGHPCRHP
  ABW16858.1    15    marmophin              1915    DWEYHAHPKPNSFWT
  ...
  grep -i conotoxin |
  filter-columns '10 <= $2 && $2 <= 30' |
  sort-table -u -k 5 |
  sort-table -k 2,2n |
  align-columns -
  AAN78127.1    12    alpha-conotoxin ImII             1515    ACCSDRRCRWRC
  AAN78128.1    12    alpha-conotoxin ImI              1357    GCCSDPRCAWRC
  ADB43130.1    15    conotoxin Cal 1a                 1750    KCCKRHHGCHPCGRK
  ADB43131.1    15    conotoxin Cal 1b                 1708    LCCKRHHGCHPCGRT
  AAO33169.1    16    alpha-conotoxin GIC              1615    GCCSHPACAGNNQHIC
  ADB43128.1    16    conotoxin Cal 5.1                1829    DPAPCCQHPIETCCRR
  AAD31913.1    18    alpha A conotoxin Tx2            2010    PECCSHPACNVDHPEICR
  ADB43129.1    18    conotoxin Cal 5.2                2008    MIQRSQCCAVKKNCCHVG
  ADB65789.1    20    conotoxin Cal 16                 2134    LEMQGCVCNANAKFCCGEGR
  ADD97803.1    20    conotoxin Cal 1.2                2206    AGCCPTIMYKTGACRTNRCR
  AAD31912.1    21    alpha A conotoxin Tx1            2304    PECCSDPRCNSSHPELCGGRR
  AAN78279.1    21    conotoxin Vx-II                  2252    WIDPSHYCCCGGGCTDDCVNC
  ADB43125.1    22    conotoxin Cal 14.2               2157    GCPADCPNTCDSSNKCSPGFPG
  ADD97802.1    23    conotoxin Cal 6.4                2514    GCWLCLGPNACCRGSVCHDYCPR
  AAD31915.1    24    O-superfamily conotoxin TxO2     2565    CYDSGTSCNTGNQCCSGWCIFVCL
  AAD31916.1    24    O-superfamily conotoxin TxO3     2555    CYDGGTSCDSGIQCCSGWCIFVCF
  AAD31920.1    24    omega conotoxin SVIA mutant 1    2495    CRPSGSPCGVTSICCGRCYRGKCT
  AAD31921.1    24    omega conotoxin SVIA mutant 2    2419    CRPSGSPCGVTSICCGRCSRGKCT
  ABE27006.1    25    conotoxin p114a                  2917    FPRPRICNLACRAGIGHKYPFCHCR
  ABE27007.1    25    conotoxin p114.1                 2645    GPGSAICNMACRLGQGHMYPFCNCN
  ...
  esearch -db protein -query "conotoxin" |
  efilter -query "mat_peptide [FKEY]" |
  efetch -format gpc |
  xtract -insdx complete mat_peptide "%peptide" product mol_wt peptide |
  xtract -pattern Rec -select product -contains conotoxin |
  xtract -pattern Rec -sort mol_wt
  ... 
    <Rec>
    <accession>AAO33169.1</accession>
    <feature_key>mat_peptide</feature_key>
    <peptide_Len>16</peptide_Len>
    <product>alpha-conotoxin GIC</product>
    <mol_wt>1615</mol_wt>
    <peptide>GCCSHPACAGNNQHIC</peptide>
  </Rec>
  <Rec>
    <accession>AIC77099.1</accession>
    <feature_key>mat_peptide</feature_key>
    <peptide_Len>16</peptide_Len>
    <product>conotoxin Im1.2</product>
    <mol_wt>1669</mol_wt>
    <peptide>GCCSHPACNVNNPHIC</peptide>
  </Rec>
  ...
  esearch -db protein -query "RAG1 [GENE] AND Mus musculus [ORGN]" |
  efetch -format gpc |
  xtract -insd source organism strain |
  sort-table -u -k 2,3
  P15919.2       Mus musculus               -
  AAO61776.1     Mus musculus               129/Sv
  NP_033045.2    Mus musculus               C57BL/6
  EDL27655.1     Mus musculus               mixed
  BAD69530.1     Mus musculus castaneus     -
  BAD69531.1     Mus musculus domesticus    BALB/c
  BAD69532.1     Mus musculus molossinus    MOA
    1 catgccattc gttgagttgg aaacaaactt gccggctagc cgcatacccg cggggctgga
   61 gaaccggctg tgtgcggcca cagccaccat cctggacaaa cccgaagacg tgagtgaggg
  121 tcggcgagaa cttgtgggct agggtcggac ctcccaatga cccgttccca tccccaggga
  181 ccccactccc ctggtaacct ctgaccttcc gtgtcctatc ctcccttcct agatcccttc
  ...
  C   A   T   G   C   C   A   T   T   C
  1   2   3   4   5   6   7   8   9  10
  esearch -db gene -query "BRCA2 [GENE] AND human [ORGN]" |
  efetch -format docsum |
  ...
  <GenomicInfoType>
    <ChrLoc>13</ChrLoc>
    <ChrAccVer>NC_000013.11</ChrAccVer>
    <ChrStart>32315479</ChrStart>
    <ChrStop>32399671</ChrStop>
    <ExonCount>27</ExonCount>
  </GenomicInfoType>
  ...
  xtract -pattern GenomicInfoType -element ChrAccVer ChrStart ChrStop
  NC_000013.11    32315479    32399671
  efetch -db nuccore -format gb -id NC_000013.11 \
    -chr_start 32315479 -chr_stop 32399671
  xtract -pattern GenomicInfoType -element ChrAccVer -inc ChrStart ChrStop
  NC_000013.11    32315480    32399672
  xtract -pattern GenomicInfoType -element ChrAccVer -1-based ChrStart ChrStop
  -element ChrAccVer -ucsc-based ChrStart ChrStop
  NC_000013.11    32315479    32399672
  esearch -db gene -query "Homo sapiens [ORGN] AND X [CHR]" |
  efilter -status alive -type coding | efetch -format docsum |
  xtract -pattern DocumentSummary -NAME Name -DESC Description \
    -block GenomicInfoType -if ChrLoc -equals X \
      -min ChrStart,ChrStop -element "&NAME" "&DESC" |
  ...
  <GenomicInfo>
    <GenomicInfoType>
      <ChrLoc>X</ChrLoc>
      <ChrAccVer>NC_000023.11</ChrAccVer>
      <ChrStart>155997630</ChrStart>
      <ChrStop>156013016</ChrStop>
      <ExonCount>14</ExonCount>
    </GenomicInfoType>
    <GenomicInfoType>
      <ChrLoc>Y</ChrLoc>
      <ChrAccVer>NC_000024.10</ChrAccVer>
      <ChrStart>57184150</ChrStart>
      <ChrStop>57199536</ChrStop>
      <ExonCount>14</ExonCount>
    </GenomicInfoType>
  </GenomicInfo>
  ...
  ...
  57121860    FAAH2     fatty acid amide hydrolase 2
  57133042    SPIN2A    spindlin family member 2A
  57184150    IL9R      interleukin 9 receptor
  57592010    ZXDB      zinc finger X-linked duplicated B
  ...
  sort-table -k 1,1n | cut -f 2- |
  grep -v pseudogene | grep -v uncharacterized | grep -v hypothetical |
  between-two-genes AMER1 FAAH2
  FAAH2      fatty acid amide hydrolase 2
  SPIN2A     spindlin family member 2A
  ZXDB       zinc finger X-linked duplicated B
  NLRP2B     NLR family pyrin domain containing 2B
  ZXDA       zinc finger X-linked duplicated A
  SPIN4      spindlin family member 4
  ARHGEF9    Cdc42 guanine nucleotide exchange factor 9
  AMER1      APC membrane recruitment protein 1
  esearch -db gene -query "DDT [GENE] AND mouse [ORGN]" |
  efetch -format docsum |
  xtract -pattern GenomicInfoType -element ChrAccVer ChrStart ChrStop |
  NC_000076.6    75773373    75771232
  while IFS=$'\t' read acn str stp
  do
    efetch -db nuccore -format gb \
      -id "$acn" -chr_start "$str" -chr_stop "$stp"
  done
  LOCUS       NC_000076               2142 bp    DNA     linear   CON 08-AUG-2019
  DEFINITION  Mus musculus strain C57BL/6J chromosome 10, GRCm38.p6 C57BL/6J.
  ACCESSION   NC_000076 REGION: complement(75771233..75773374)
  ...
       gene            1..2142
                       /gene="Ddt"
       mRNA            join(1..159,462..637,1869..2142)
                       /gene="Ddt"
                       /product="D-dopachrome tautomerase"
                       /transcript_id="NM_010027.1"
       CDS             join(52..159,462..637,1869..1941)
                       /gene="Ddt"
                       /codon_start=1
                       /product="D-dopachrome decarboxylase"
                       /protein_id="NP_034157.1"
                       /translation="MPFVELETNLPASRIPAGLENRLCAATATILDKPEDRVSVTIRP
                       GMTLLMNKSTEPCAHLLVSSIGVVGTAEQNRTHSASFFKFLTEELSLDQDRIVIRFFP
                       ...
  nquire -url http://w1.weather.gov/xml/current_obs/KSFO.xml |
  xtract -pattern current_observation -tab "\n" \
    -element weather temp_f wind_dir wind_mph
  A Few Clouds
  54.0
  Southeast
  5.8
  nquire -get http://collections.mnh.si.edu/services/resolver/resolver.php \
    -voucher "Birds:321082" |
  xtract -pattern Result -tab "\n" -element ScientificName StateProvince Country
  Archilochus colubris
  Maryland
  United States
  nquire -ftp ftp.ncbi.nlm.nih.gov pub/gdp ideogram_9606_GCF_000001305.14_850_V1 |
  grep acen | cut -f 1,2,6,7 | awk '/^X\t/'
  X    p    58100001    61000000
  X    q    61000001    63800000
  nquire -lst ftp://nlmpubs.nlm.nih.gov online/mesh/MESH_FILES/xmlmesh
  nquire -dir ftp.ncbi.nlm.nih.gov gene/DATA
  nquire -dwn ftp.nlm.nih.gov online/mesh/MESH_FILES/xmlmesh desc2021.zip
  nquire -asp ftp.ncbi.nlm.nih.gov pubmed baseline pubmed22n0001.xml.gz
  nquire -url http://webservice.wikipathways.org getPathway -pwId WP455 |
  xtract -pattern "ns1:getPathwayResponse" -decode ":gpml" |
  xtract -pattern Pathway -block Xref \
    -if @Database -equals "Entrez Gene" \
      -tab "\n" -element @ID
  nquire -get http://mygene.info/v3 gene 3043 |
  "position": [
    [
      5225463,
      5225726
    ],
    [
      5226576,
      5226799
    ],
    [
      5226929,
      5227071
    ]
  ],
  "strand": -1,
  transmute -j2x |
  <position>
    <position_E>5225463</position_E>
    <position_E>5225726</position_E>
  </position>
  ...
  nquire -get http://mygene.info/v3/gene/2652 |
  transmute -j2x |
  <pathway>
    <reactome>
      <id>R-HSA-162582</id>
      <name>Signal Transduction</name>
    </reactome>
    ...
    <wikipathways>
      <id>WP455</id>
      <name>GPCRs, Class A Rhodopsin-like</name>
    </wikipathways>
  </pathway>
  xtract -pattern opt -group "pathway/*" \
    -pfc "\n" -element "?,name,id"
  reactome        Signal Transduction                R-HSA-162582
  reactome        Disease                            R-HSA-1643685
  ...
  reactome        Diseases of the neuronal system    R-HSA-9675143
  wikipathways    GPCRs, Class A Rhodopsin-like      WP455
  xtract -pattern opt -path pathway.wikipathways.id -tab "\n" -element id
  nquire -ftp ftp.ncbi.nlm.nih.gov gene/DATA gene_info.gz |
  gunzip -c | grep -v NEWENTRY | cut -f 2,3 |
  transmute -t2x -set Set -rec Rec -skip 1 Code Name
  ...
  <Rec>
    <Code>1246500</Code>
    <Name>repA1</Name>
  </Rec>
  <Rec>
    <Code>1246501</Code>
    <Name>repA2</Name>
  </Rec>
  ...
  nquire -ftp ftp.ncbi.nlm.nih.gov gene/DATA gene_info.gz |
  gunzip -c | grep -v NEWENTRY | cut -f 2,3 |
  transmute -t2x -set Set -rec Rec -header
  fls=$( nquire -lst ftp.ncbi.nlm.nih.gov genbank )
  for div in \
    bct con env est gss htc htg inv mam pat \
    phg pln pri rod sts syn tsa una vrl vrt
  do
    echo "$fls" |
    grep ".seq.gz" | grep "gb${div}" |
    sort -V | skip-if-file-exists |
    nquire -asp ftp.ncbi.nlm.nih.gov genbank
  done
  nquire -lst ftp.ncbi.nlm.nih.gov genbank |
  grep "^gbvrl" | grep ".seq.gz" | sort -V |
  tail -n 1 | skip-if-file-exists |
  nquire -asp ftp.ncbi.nlm.nih.gov genbank
  gunzip -c *.seq.gz | filter-genbank -taxid 11292 |
  xtract -insd CDS gene product feat_location sub_sequence
  nquire -ftp ftp.ncbi.nlm.nih.gov genbank daily-nc Last.File |
  sed "s/flat/gnp/g" |
  nquire -ftp ftp.ncbi.nlm.nih.gov genbank daily-nc |
  gunzip -c | transmute -g2x |
  xtract -pattern INSDSeq -select INSDQualifier_value -equals "taxon:2697049" |
  xtract -insd mat_peptide product sub_sequence
  /Archive/12/34/56/12345678.xml.gz
  export EDIRECT_LOCAL_ARCHIVE=/Volumes/external_drive_name/
  export EDIRECT_LOCAL_ARCHIVE=$HOME/internal_directory_name/
  export EDIRECT_LOCAL_WORKING=/Volumes/external_drive_name/
  esearch -db pubmed -query "PNAS [JOUR]" -pub abstract |
  efetch -format uid | stream-pubmed | gunzip -c |
  esearch -db pubmed -query "Cozzarelli NR [AUTH]" | elink -cited |
  efetch -format uid | fetch-pubmed |
  xtract -pattern PubmedArticle -block Author \
    -sep " " -tab "\n" -element LastName,Initials |
  sort-uniq-count-rank
  145    Cozzarelli NR
  108    Maxwell A
  86     Wang JC
  81     Osheroff N
  ...
  /Postings/TIAB/c/a/n/c/canc.TIAB.trm
  phrase-search -fields

  phrase-search -terms TITL

  phrase-search -totals PROP
  phrase-search -count "catabolite repress*"

  phrase-search -counts "catabolite repress*"
  phrase-search -query "(literacy AND numeracy) NOT (adolescent OR child)"
  phrase-search -query "selective serotonin reuptake inhibitor"
  phrase-search -query "vitamin c + + common cold"

  phrase-search -query "vitamin c ~ ~ common cold"
  phrase-search -title "Genetic Control of Biochemical Reactions in Neurospora."

  phrase-search -match "tn3 transposition immunity [PAIR]" | just-first-key
  phrase-search -query "C14.907.617.812* [TREE] AND 2015:2019 [YEAR]"

  phrase-search -query "Raynaud Disease [MESH]"
  phrase-search -query "selective serotonin ~ ~ ~ reuptake inhibit*" |
  fetch-pubmed |
  xtract -pattern PubmedArticle -num AuthorList/Author |
  sort-uniq-count -n |
  reorder-columns 2 1 |
  head -n 25 |
  align-columns -g 4 -a lr
  0      51
  1    1382
  2    1897
  3    1906
  ...
  phrase-search -totals YEAR |
  print-columns '$2, $1, total += $1' |
  print-columns '$1, log($2)/log(10), log($3)/log(10)' |
  filter-columns '$1 >= 1800' | sed '$d' |
  xy-plot annual-and-cumulative.png
  phrase-search -terms DISZ | grep -i Raynaud

  phrase-search -counts "Raynaud* [DISZ]"

  phrase-search -query "Raynaud Disease [DISZ]"
  phrase-search -db pubmed -query "Havran W* [AUTH]" |
  phrase-search -link CITED |
  fetch-pubmed |
  xtract -pattern PubmedArticle -histogram Journal/ISOAbbreviation |
  sort-table -nr | head -n 10
  921    J Immunol
  293    Eur J Immunol
  248    J Exp Med
  168    Front Immunol
  149    Proc Natl Acad Sci U S A
  139    Cell Immunol
  121    Int Immunol
  106    J Invest Dermatol
  105    Immunol Rev
  99     Immunity
  cat $EDIRECT_LOCAL_WORKING/pubmed/Scratch/Current/*.xml |
  xtract -timer -turbo -pattern PubmedArticle -PMID MedlineCitation/PMID \
    -group AuthorList -if "#LastName" -eq 7 -element "&PMID" LastName
  ...
  </PubmedArticle>
  <NEXT_RECORD_SIZE>6374</NEXT_RECORD_SIZE>
  <PubmedArticle>
  ...
  custom-index $( which idx-grant ) GRNT
  xtract -set IdxDocumentSet -rec IdxDocument -pattern PubmedArticle \
    -wrp IdxUid -element MedlineCitation/PMID -clr -rst -tab "" \
    -group PubmedArticle -pkg IdxSearchFields \
      -block PubmedArticle -wrp GRNT -element Grant/GrantID
  ...
  <IdxDocument>
    <IdxUid>2539356</IdxUid>
    <IdxSearchFields>
      <GRNT>AI 00468</GRNT>
      <GRNT>GM 07197</GRNT>
      <GRNT>GM 29067</GRNT>
    </IdxSearchFields>
  </IdxDocument>
  ...
  ...
  <InvDocument>
    <InvKey>ai 00468</InvKey>
    <InvIDs>
      <GRNT>2539356</GRNT>
    </InvIDs>
  </InvDocument>
  <InvDocument>
    <InvKey>gm 07197</InvKey>
    <InvIDs>
      <GRNT>2539356</GRNT>
    </InvIDs>
  </InvDocument>
  <InvDocument>
    <InvKey>gm 29067</InvKey>
    <InvIDs>
      <GRNT>2539356</GRNT>
    </InvIDs>
  </InvDocument>
  ..
  esearch -db pubmed -query "PNAS [JOUR]" -pub abstract |
  efetch -format uid | fetch-pubmed |
  xtract -stops -rec Rec -pattern PubmedArticle \
    -wrp Year -year "PubDate/*" -wrp Abst -words Abstract/AbstractText |
  xtract -rec Pub -pattern Rec \
    -wrp Year -element Year -wrp Num -num Abst > countsByYear.xml
  <Pub><Year>2018</Year><Num>198</Num></Pub>
  <Pub><Year>2018</Year><Num>167</Num></Pub>
  <Pub><Year>2018</Year><Num>242</Num></Pub>
  for yr in {1960..2021}
  do
    cat countsByYear.xml |
    xtract -set Raw -pattern Pub -select Year -eq "$yr" |
    xtract -pattern Raw -lbl "$yr" -avg Num
  done |
  1969    122
  1970    120
  1971    127
  ...
  2018    207
  2019    207
  2020    208
  tee /dev/tty |
  xy-plot pnas.png
  rm countsByYear.xml
  ...
  <Rec>
    <Code>D064007</Code>
    <Name>Ataxia Telangiectasia Mutated Proteins</Name>
    ...
    <Tree>D12.776.157.687.125</Tree>
    <Tree>D12.776.660.720.125</Tree>
  </Rec>
  ...
  cat $EDIRECT_LOCAL_ARCHIVE/pubmed/Data/meshconv.xml |
  xtract -pattern Rec \
    -if Name -starts-with "ataxia telangiectasia" \
      -element Code
  C565779
  C576887
  D001260
  D064007
  efetch -db mesh -id D064007 -format docsum
  phrase-search -query "Berg CM [AUTH]" |
  phrase-search -link CITED |
  phrase-search -filter "Transposases [MESH]"
  epost -db pubmed
  sudo mdutil -i off "${EDIRECT_LOCAL_ARCHIVE}"
  sudo mdutil -E "${EDIRECT_LOCAL_ARCHIVE}"
  sudo touch "${EDIRECT_LOCAL_ARCHIVE}/.fseventsd/no_log"
  gene    matK
  CDS     maturase K
  gene    ATP2B1
  CDS     ATPase 1 isoform 2
  CDS     ATPase 1 isoform 7
  gene    ps2
  CDS     peptide synthetase
  #!/bin/bash

  gene=""
  while IFS=$'\t' read feature product
  do
    if [ "$feature" = "gene" ]
    then
      gene="$product"
    else
      echo "$gene\t$product"
    fi
  done
  matK      maturase K
  ATP2B1    ATPase 1 isoform 2
  ATP2B1    ATPase 1 isoform 7
  ps2       peptide synthetase
  #!/bin/bash
  gene=""
  while IFS=$'\t' read feature product
  if [ "$feature" = "gene" ]
  then
    gene="$product"
  else
    echo "$gene\t$product"
  fi
  if [ "$feature" = "gene" ]
  then
    ...
  elif [ "$feature" = "mRNA" ]
  then
    ...
  elif [ "$feature" = "CDS" ]
  then
    ...
  else
    ...
  fi
  mrna=$( echo "$product" | grep 'transcript variant' |
          sed 's/^.*transcript \(variant .*\).*$/\1/' )
  #!/bin/bash

  printf "Years"
  for disease in "$@"
  do
    frst=$( echo -e "${disease:0:1}" | tr [a-z] [A-Z] )
    printf "\t${frst}${disease:1:3}"
  done
  printf "\n"

  for (( yr = 2020; yr >= 1900; yr -= 10 ))
  do
    printf "${yr}s"
    for disease in "$@"
    do
      val=$(
        esearch -db pubmed -query "$disease [TITL]" |
        efilter -mindate "${yr}" -maxdate "$((yr+9))" |
        xtract -pattern ENTREZ_DIRECT -element Count
      )
      printf "\t${val}"
    done
    printf "\n"
  done
  chmod +x scan_for_diseases.sh
  scan_for_diseases.sh diphtheria pertussis tetanus |
  Years    Diph    Pert    Teta
  2020s    104     281     154
  2010s    860     2558    1296
  2000s    892     1968    1345
  1990s    1150    2662    1617
  1980s    780     1747    1488
  ...
  xy-plot diseases.png
  align-columns -h 2 -g 4 -a ln
  Years    Diph    Pert    Teta
  2020s     104     281     154
  2010s     860    2558    1296
  2000s     892    1968    1345
  1990s    1150    2662    1617
  1980s     780    1747    1488
  ...
  transmute -t2x -set Set -rec Rec -header
  sleep 1
  echo "Garber ED Casadaban MJ Mortimer RK" |
  xargs -n 2 sh -c 'esearch -db pubmed -query "$0 $1 [AUTH]" |
  xtract -pattern ENTREZ_DIRECT -lbl "$1 $0" -element Count'
  ED Garber       35
  MJ Casadaban    46
  RK Mortimer     85
  cat organisms.txt |
  Arabidopsis thaliana
  Caenorhabditis elegans
  Danio rerio
  Drosophila melanogaster
  Escherichia coli
  Homo sapiens
  Mus musculus
  Saccharomyces cerevisiae
  while read org
  do
    esearch -db taxonomy -query "$org [LNGE] AND family [RANK]" < /dev/null |
    efetch -format docsum |
    xtract -pattern DocumentSummary -lbl "$org" \
      -element ScientificName Division
  done
  Arabidopsis thaliana        Brassicaceae          eudicots
  Caenorhabditis elegans      Rhabditidae           nematodes
  Danio rerio                 Cyprinidae            bony fishes
  Drosophila melanogaster     Drosophilidae         flies
  Escherichia coli            Enterobacteriaceae    enterobacteria
  Homo sapiens                Hominidae             primates
  Mus musculus                Muridae               rodents
  Saccharomyces cerevisiae    Saccharomycetaceae    ascomycetes
  for org in \
    "Arabidopsis thaliana" \
    "Caenorhabditis elegans" \
    "Danio rerio" \
    "Drosophila melanogaster" \
    "Escherichia coli" \
    "Homo sapiens" \
    "Mus musculus" \
    "Saccharomyces cerevisiae"
  do
    esearch -db taxonomy -query "$org [LNGE] AND family [RANK]" |
    efetch -format docsum |
    xtract -pattern DocumentSummary -lbl "$org" \
      -element ScientificName Division
  done
  for i in *
  do
    if [ -f "$i" ]
    then
      echo $(basename "$i")
    fi
  done
  #!/bin/sh
  xargs -n "$@" echo |
  sed 's/ /,/g
  ...
  efetch -format acc |
  join-into-groups-of 200 |
  xargs -n 1 sh -c 'epost -db nuccore -format acc -id "$0" |
  elink -target pubmed |
  efetch -format abstract'
  efetch -db nuccore -id J01749,U54469 -format fasta | basecount
  J01749.1    A 983    C 1210    G 1134    T 1034
  U54469.1    A 849    C 699     G 585     T 748
  package main

  import (
      "eutils"
      "fmt"
      "os"
      "sort"
  )

  func main() {

      fsta := eutils.FASTAConverter(os.Stdin, false)

      countLetters := func(id, seq string) {

          counts := make(map[rune]int)
          for _, base := range seq {
              counts[base]++
          }

          var keys []rune
          for ky := range counts {
              keys = append(keys, ky)
          }
          sort.Slice(keys, func(i, j int) bool { return keys[i] < keys[j] })

          fmt.Fprintf(os.Stdout, "%s", id)
          for _, base := range keys {
              num := counts[base]
              fmt.Fprintf(os.Stdout, "\t%c %d", base, num)
          }
          fmt.Fprintf(os.Stdout, "\n")
      }

      for fsa := range fsta {
          countLetters(fsa.SeqID, fsa.Sequence)
      }
  }
  time basecount < NC_000014.fsa
  NC_000014.9    A 26673415    C 18423758    G 18559033    N 16475569    T 26911943
  2.287
  package main

  import (
      "eutils"
      "fmt"
      "os"
      "sort"
  )
  func main() {
      fsta := eutils.FASTAConverter(os.Stdin, false)
      countLetters := func(id, seq string) {
          counts := make(map[rune]int)
          for _, base := range seq {
             counts[base]++
          }
          var keys []rune
          for ky := range counts {
              keys = append(keys, ky)
          }
          sort.Slice(keys, func(i, j int) bool { return keys[i] < keys[j] })
         fmt.Fprintf(os.Stdout, "%s", id)
          for _, base := range keys {
              num := counts[base]
              fmt.Fprintf(os.Stdout, "\t%c %d", base, num)
          }
          fmt.Fprintf(os.Stdout, "\n")
      }
      for fsa := range fsta {
          countLetters(fsa.SeqID, fsa.Sequence)
      }
  }
  cd ~
  mkdir basecount
  cd basecount
  go mod init basecount
  echo "replace eutils => $HOME/edirect/eutils" >> go.mod
  go get eutils
  go mod tidy
  go build
  #!/bin/bash

  if [ ! -f "go.mod" ]
  then
    go mod init "$( basename $PWD )"
    echo "replace eutils => $HOME/edirect/eutils" >> go.mod
    go get eutils
  fi

  if [ ! -f "go.sum" ]
  then
    go mod tidy
  fi

  go build
  cd basecount
  chmod +x build.sh
  ./build.sh
  env GOOS=darwin GOARCH=arm64 go build -o basecount.Silicon basecount.go
  for fl in *.go
  do
    go build -o "${fl%.go}" "$fl"
  done
  import subprocess
  import shlex

  def execute(cmmd, data=""):
      if isinstance(cmmd, str):
          cmmd = shlex.split(cmmd)
      res = subprocess.run(cmmd, input=data,
                           capture_output=True,
                           encoding='UTF-8')
      return res.stdout.strip()

  def pipeline(cmmds, data=""):
      def flatten(cmmd):
          if isinstance(cmmd, str):
              return cmmd
          else:
              return shlex.join(cmmd)
      if not isinstance(cmmds, str):
          cmmds = ' | '.join(map(flatten, cmmds))
      res = subprocess.run(cmmds, input=data, shell=True,
                           capture_output=True,
                           encoding='UTF-8')
      return res.stdout.strip()

  def efetch(*, db, id, format, mode=""):
      cmmd = ('efetch', '-db', db, '-id', str(id), '-format', format)
      if mode:
          cmmd = cmmd + ('-mode', mode)
      return execute(cmmd))
  #!/usr/bin/env python3

  import sys
  import os
  import shutil

  sys.path.insert(1, os.path.dirname(shutil.which('xtract')))
  import edirect
  edirect.execute("efetch -db nuccore -id NM_000518.5 -format fasta")
  edirect.execute(('efetch', '-db', 'nuccore', '-id', 'NM_000518.5', '-format', 'fasta'))
  seq = edirect.execute("efetch -db nuccore -id NM_000518.5 -format fasta")
  sub = edirect.execute("transmute -extract -1-based -loc 51..494", seq)
  prt = edirect.execute(('transmute', '-cds2prot', '-every', '-trim'), sub)
  edirect.pipeline(('efetch -db protein -id NP_000509.1 -format gp',
                    'xtract -insd Protein mol_wt sub_sequence'))
  edirect.pipeline('''efetch -db nuccore -id J01749 -format fasta |
    transmute -replace -offset 1907 -delete GG -insert TC |
    transmute -search -circular GGATCC:BamHI GAATTC:EcoRI CTGCAG:PstI |
    align-columns -g 4 -a rl''')
  edirect.efetch(db="nuccore", id="NM_000518.5", format="fasta")
  db = "pubmed"
  res = edirect.execute(("./datefields.sh", db), "")
  #include <misc/eutils_client/eutils_client.hpp>
  string Execute (
    const string& cmmd,
    const vector<string>& args,
    const string& data = kEmptyStr
  );
  string seq = ncbi::edirect::Execute
    ( "efetch", { "-db", "nuccore", "-id", "NM_000518.5", "-format", "fasta" } );
  string sub = ncbi::edirect::Execute
    ( "transmute", { "-extract", "-1-based", "-loc", "51..494" }, seq );
  string prt = ncbi::edirect::Execute
    ( "transmute", { "-cds2prot", "-every", "-trim" }, sub );
  vector<string> args;

  args.push_back("-db");
  args.push_back("pubmed");
  args.push_back("-format");
  args.push_back("abstract");
  args.push_back("-id");
  args.push_back(uid);
  string uid = ncbi::edirect::Execute
    ( "cit2pmid", { "-asn", FORMAT(MSerial_FlatAsnText << pub) } );
  string xml = ncbi::edirect::Execute
    ( "efetch", { "-db", "pubmed", "-format", "xml" }, uid );
  string asn = ncbi::edirect::Execute
    ( "pma2pme", { "-std" }, xml );
  #include <objects/pubmed/Pubmed_entry.hpp>

  unique_ptr<CObjectIStream> stm;
  stm.reset ( CObjectIStream::CreateFromBuffer
    ( eSerial_AsnText, asn.data(), asn.length() ) );

  CRef<CPubmed_entry> pme ( new CPubmed_entry );
  stm->Read ( ObjectInfo ( *pme ) );
  <Field>
    <Name>ALL</Name>
    <FullName>All Fields</FullName>
    <Description>All terms from all searchable fields</Description>
    <TermCount>280005319</TermCount>
    <IsDate>N</IsDate>
    <IsNumerical>N</IsNumerical>
    <SingleToken>N</SingleToken>
    <Hierarchy>N</Hierarchy>
    <IsHidden>N</IsHidden>
    <IsTruncatable>Y</IsTruncatable>
    <IsRangable>N</IsRangable>
  </Field>
  sort    Sorts lines of text

    -f    Ignore case
    -n    Numeric comparison
    -r    Reverse result order

    -k    Field key (start,stop or first)
    -u    Unique lines with identical keys

    -b    Ignore leading blanks
    -s    Stable sort
    -t    Specify field separator

  uniq    Removes repeated lines

    -c    Count occurrences
    -i    Ignore case

    -f    Ignore first n fields
    -s    Ignore first n characters

    -d    Only output repeated lines
    -u    Only output non-repeated lines

  grep    Matches patterns using regular expressions

    -i    Ignore case
    -v    Invert search
    -w    Search expression as a word
    -x    Search expression as whole line

    -e    Specify individual pattern

    -c    Only count number of matches
    -n    Print line numbers
    -A    Number of lines after match
    -B    Number of lines before match
  Characters

    .     Any single character (except newline)
    \w    Alphabetic [A-Za-z], numeric [0-9], or underscore (_)
    \s    Whitespace (space or tab)
    \     Escapes special characters
    []    Matches any enclosed characters

  Positions

    ^     Beginning of line
    $     End of line
    \b    Word boundary

  Repeat Matches

    ?     0 or 1
    *     0 or more
    +     1 or more
    {n}   Exactly n

  Escape Sequences

    \n    Line break
    \t    Tab character
  sed         Replaces text strings

    -e        Specify individual expression
    s///      Substitute
        /g    Global
        /I    Case-insensitive
        /p    Print

  tr          Translates characters

    -d        Delete character
    -s        Squeeze runs of characters

  rev         Reverses characters on line
  column  Aligns columns by content width

    -s    Specify field separator
    -t    Create table

  expand  Aligns columns to specified positions

    -t    Tab positions

  fold    Wraps lines at a specific width

    -w    Line width
    -s    Fold at spaces
  cut     Removes parts of lines

    -c    Characters to keep
    -f    Fields to keep
    -d    Specify field separator

    -s    Suppress lines with no delimiters

  head    Prints first lines

    -n    Number of lines

  tail    Prints last lines

    -n    Number of lines
  wc        Counts words, lines, or characters

    -c      Characters
    -l      Lines
    -w      Words

  xargs     Constructs arguments

    -n      Number of words per batch

  mktemp    Make temporary file

  join      Join columns in files by common field

  paste     Merge columns in files by line number
  tar     Archive files

    -c    Create archive
    -f    Name of output file
    -z    Compress archive with gzip

  gzip    Compress file

    -k    Keep original file
    -9    Best compression

  unzip   Decompress .zip archive

    -p    Pipe to stdout

  gzcat   Decompress .gz archive and pipe to stdout
  cd      Changes directory

    /     Root
    ~     Home
    .     Current
    ..    Parent
    -     Previous

  ls      Lists file names

    -1    One entry per line
    -a    Show files beginning with dot (.)
    -l    List in long format
    -R    Recursively explore subdirectories
    -S    Sort files by size
    -t    Sort by most recently modified
    .*    Current and parent directory

  pwd     Prints working directory path
  <         Read stdin from file
  >         Redirect stdout to file
  >>        Append to file
  2>        Redirect stderr
  2>&1      Merge stderr into stdout
  |         Pipe between programs
  <(cmd)    Execute command, read results as file
  $0      Name of script
  $n      Nth argument
  $#      Number of arguments
  "$*"    Argument list as one argument
  "$@"    Argument list as separate arguments
  $?      Exit status of previous command
  -d      Directory exists
  -f      File exists
  -s      File is not empty
  -n      Length of string is non-zero
  -x      File is executable
  -z      Variable is empty or not set
  set        Set optional behaviors

    -e       Exit immediately upon error
    -u       Treat unset variables as error
    -x       Trace commands and argument
          BAS=$(printf pubmed%03d $n) 
          DIR=$(dirname "$0")
          FIL=$(basename "$0")
          FILE="example.tar.gz"
  #       ${FILE#.*}  -> tar.gz
  ##      ${FILE##.*} -> gz
          FILE="example.tar.gz"
          TYPE="http://identifiers.org/uniprot_enzymes/"
  %       ${FILE%.*}  -> example.tar
          ${TYPE%/}   -> http://identifiers.org/uniprot_enzymes
  %%      ${FILE%%.*} -> example
          while IFS=$'\t' read ...
          for sym in HBB BRCA2 CFTR RAG1
          for col in "$@"
          for yr in {1960..2020}
          for i in $(seq $first $incr $last)
          for fl in *.xml.gz
  https://www.ncbi.nlm.nih.gov/books/NBK25501
  https://www.ncbi.nlm.nih.gov/pubmed/37994677
  https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities
  https://missing.csail.mit.edu/2020/shell-tools/
  https://cacm.acm.org/research/the-go-programming-language-and-environment/
  https://commandcenter.blogspot.com/2012/06/less-is-exponentially-more.html

  https://commandcenter.blogspot.com/2024/01/what-we-got-right-what-we-got-wrong.html
  https://golang.org/doc/install#download
  https://www.ncbi.nlm.nih.gov/home/about/policies/

                            PUBLIC DOMAIN NOTICE
               National Center for Biotechnology Information

  This software/database is a "United States Government Work" under the
  terms of the United States Copyright Act. It was written as part of
  the author's official duties as a United States Government employee and
  thus cannot be copyrighted. This software/database is freely available
  to the public for use. The National Library of Medicine and the U.S.
  Government have not placed any restriction on its use or reproduction.

  Although all reasonable efforts have been taken to ensure the accuracy
  and reliability of the software and data, the NLM and the U.S.
  Government do not and cannot warrant the performance or results that
  may be obtained by using this software or data. The NLM and the U.S.
  Government disclaim all warranties, express or implied, including
  warranties of performance, merchantability or fitness for any particular
  purpose.

  Please cite the author in any work or product based on this material.

Entrez Direct: E-utilities on the Unix Command Line

Authors

Affiliations

Getting Started

Introduction

Installation

Quick Start

Programmatic Access

Navigation Functions

Constructing Multi-Step Queries

Writing Commands on Multiple Lines

Accessory Programs

Discovery by Navigation

Retrieving PubMed Reports

Retrieving Sequence Reports

Searching and Filtering

Restricting Query Results

Limiting by Date

Fetch by Identifier

Reading Large Lists of Identifiers

Indexed Fields

Qualifying Queries by Indexed Field

Examining Intermediate Results

Combining Independent Queries

Structured Data

Advantages of XML Format

Conversion of XML into Tables

Format Customization

Element Variants

Exploration Control

Sequential Exploration

Nested Exploration

Saving Data in Variables

Conditional Execution

Post-processing Functions

Viewing an XML Hierarchy

Code Nesting Comparison

Complex Objects

Author Exploration

Date Selection

PMID Extraction

Heterogeneous Data

Recursive Definitions

Additional Elink Options

Repackaging XML Results

Multi-Step Transformations

Sequence Records

NCBI Data Model for Sequence Records

Sequence Records in INSDSeq XML

Generating Qualifier Extraction Commands

Snail Venom Peptide Sequences

Missing Qualifiers

Sequence Coordinates

Gene Positions

Coordinate Conversions

Gene Records

Genes in a Region

Gene Sequence

External Data

Querying External Services

XML Namespaces

Automatic Xtract Format Conversion

JSON Arrays

JSON Mixtures

Conversion of ASN.1

Tables to XML

CSV to XML

GenBank Download

GenBank to XML

GenPept to XML

Local PubMed Cache

Random Access Archive

Local Search Index

Data Analysis and Visualization

Natural Language Processing

Following Citation Links

Rapidly Scanning PubMed

User-Specified Term Index

Processing by XML Subset

Identifier Conversion