A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure

BMC Bioinformatics. 2002 Jul 2:3:18. doi: 10.1186/1471-2105-3-18.

Abstract

Background: Covariance models (CMs) are probabilistic models of RNA secondary structure, analogous to profile hidden Markov models of linear sequence. The dynamic programming algorithm for aligning a CM to an RNA sequence of length N is O(N3) in memory. This is only practical for small RNAs.

Results: I describe a divide and conquer variant of the alignment algorithm that is analogous to memory-efficient Myers/Miller dynamic programming algorithms for linear sequence alignment. The new algorithm has an O(N2 log N) memory complexity, at the expense of a small constant factor in time.

Conclusions: Optimal ribosomal RNA structural alignments that previously required up to 150 GB of memory now require less than 270 MB.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms*
  • Computational Biology / statistics & numerical data
  • Consensus Sequence
  • Empirical Research
  • Markov Chains
  • Models, Statistical
  • Nucleic Acid Conformation*
  • Programming, Linear / statistics & numerical data*
  • RNA / chemistry*
  • RNA, Ribosomal / chemistry
  • RNA, Small Cytoplasmic / chemistry
  • RNA, Transfer / chemistry
  • Sequence Alignment / methods*
  • Sequence Alignment / statistics & numerical data*
  • Signal Recognition Particle / chemistry
  • Software Design
  • Software*
  • Time Factors

Substances

  • 7SL RNA
  • RNA, Ribosomal
  • RNA, Small Cytoplasmic
  • Signal Recognition Particle
  • RNA
  • RNA, Transfer