Background: Prokaryotes have relatively small genomes, densely-packed and apparently dominated by protein-encoding sequences. However, data now generated by high throughput RNA sequencing (RNA-seq) reveal surprisingly more-complex transcriptomes with many previously unrecognized and unanticipated non-coding small and antisense transcripts. To date, such studies have investigated primarily Bacteria. Here, we report the transcripts present in Thermococcus kodakarensis, a model hyperthermophilic Archaeon, synthesized under different growth and metabolic conditions.
Results: cDNA libraries, generated from RNA preparations isolated from cells growing in media with sulfur or pyruvate, with sulfur to stationary phase, and growing with pyruvate but with sulfur added 20 min before RNA isolation, have been deep-sequenced. The results identify >2,700 sites of transcription initiation, establish a genome-wide map of transcripts, and consensus sequences for transcription initiation and post-transcription regulatory elements in T. kodakarensis. Primary transcription start sites (TSS) are identified upstream of 1,254 annotated genes, including ~78 % of those predicted by promoter locations, and an additional 644 primary TSS and their promoters have been identified within genes. Most of the mRNAs have a 5'-untranslated region (5'-UTR) between 10 and 50 nt long (median length = 16 nt), ~20 % have 5'-UTRs from 50 to 300 nt long, ~14 % are leaderless with 5'-UTRs ≤8 nt, and ~50% contain a consensus ribosome binding sequence. The results also identify TSS for 1,018 antisense transcripts, most with sequences complementary to either the 5'- or 3'-region of a sense mRNA. The data confirm the presence of transcripts from all three CRISPR loci, the RNase P and 7S RNAs, all tRNAs and rRNAs and 69 snoRNAs predicted to be encoded in the T. kodakarensis genome. Two transcripts, putatively identified as riboswitches, were present in RNA preparations isolated from growing but not from stationary phase cells. The procedure used is designed to identify TSS but, assuming that the number of cDNA reads correlates with transcript abundance, the data obtained also provide a semi-quantitative overview of global operon expression. They document substantial differences in gene expression under different physiological conditions and are consistent with previous observations of substrate-dependent specific gene expression. Many previously unrecognized and unanticipated small RNAs have been identified, some with relative low GC contents (≤50%) and sequences that do not fold readily into base-paired secondary structures, contrary to the classical expectations for non-coding RNAs in a hyperthermophile.
Conclusion: We have identified >2,700 TSS that include almost all of the primary sites of transcription initiation upstream of annotated genes, and also many secondary sites, sites within genes and sites resulting in antisense transcripts. The T. kodakarensis genome is small (~2.1 Mbp) and tightly packed with protein-encoding genes, but the results reveal the presence of many non-coding RNAs and predict extensive RNA-based regulation in T. kodakarensis.
Overall design: cDNA libraries were generated and sequenced from RNA isolated from T. kodakarensis cells growing exponentially (Sexp) and to stationary phase (Sstat) in ASW-YT medium with sulfur, growing exponentially in ASW-YT with pyruvate (Pexp), and from cells growing exponentially in pyruvate but 20 min after sulfur addition (PS). The cDNAs were generated after first incubating the RNA preparations with terminator exonuclease (TEX). TEX does not degrade primary transcripts with a 5'-triphosphate (Sharma et al., 2010) but does digest RNAs generated by transcript processing that have a 5'-monophosphate. As a control and to fully document all transcripts, a cDNA library (C) was also generated and sequenced from an aliquot of an RNA preparation isolated from the cells growing exponentially with sulfur that was not exposed to TEX digestion.
Less...