Expression profiling by high throughput sequencing
Summary
The model organism Encyclopedia of DNA Elements project (modENCODE) has produced a comprehensive annotation of D. melanogaster transcript models based on an enormous amount of high-throughput experimental data. However, some transcribed elements may not be functional, and technical artifacts may lead to erroneous inference of transcription. Inter-species comparison provides confidence to predicted annotation, since transcriptional activity that has been evolutionarily conserved is likely to have an advantageous function. We have performed RNA-Seq and CAGE-Seq experiments on more than 80 samples from multiple tissues and stages of 15 Drosophila species, including 8 previously unsequenced genomes. We have found strikingly conserved sequence, expression, and splicing for the vast majority of transcript models in modENCODE annotation (e.g. 99% exons of coding sequences (CDS), 88% exons of untranslated regions (UTR), and 87% splicing events), indicating that the transcriptome annotation is of very high quality. We also describe dynamic transcriptome evolution within the Drosophila genus, including conserved promoter structure, labile positions of transcription start sites, and rapidly evolving RNA-editing events. We demonstrate how this phylogenetic approach to DNA element validation will prove useful in the annotation of other high priority genomes, especially for genomes that are less compact than Drosophila (e.g. the vast majority of vertebrate genomes).