Validation of skeletal muscle cis-regulatory module predictions reveals nucleotide composition bias in functional enhancers

PLoS Comput Biol. 2011 Dec;7(12):e1002256. doi: 10.1371/journal.pcbi.1002256. Epub 2011 Dec 1.

Abstract

We performed a genome-wide scan for muscle-specific cis-regulatory modules (CRMs) using three computational prediction programs. Based on the predictions, 339 candidate CRMs were tested in cell culture with NIH3T3 fibroblasts and C2C12 myoblasts for capacity to direct selective reporter gene expression to differentiated C2C12 myotubes. A subset of 19 CRMs validated as functional in the assay. The rate of predictive success reveals striking limitations of computational regulatory sequence analysis methods for CRM discovery. Motif-based methods performed no better than predictions based only on sequence conservation. Analysis of the properties of the functional sequences relative to inactive sequences identifies nucleotide sequence composition can be an important characteristic to incorporate in future methods for improved predictive specificity. Muscle-related TFBSs predicted within the functional sequences display greater sequence conservation than non-TFBS flanking regions. Comparison with recent MyoD and histone modification ChIP-Seq data supports the validity of the functional regions.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Composition
  • Chromatin Immunoprecipitation
  • Computational Biology / methods*
  • Computer Simulation
  • Conserved Sequence
  • Genome
  • Histones / genetics
  • Humans
  • Mice
  • Models, Genetic*
  • Models, Statistical
  • Muscle Fibers, Skeletal / physiology
  • Muscle, Skeletal / physiology*
  • MyoD Protein / genetics
  • NIH 3T3 Cells
  • Phylogeny
  • Regulatory Sequences, Nucleic Acid*
  • Reproducibility of Results
  • Sequence Analysis, DNA

Substances

  • Histones
  • MyoD Protein