NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Madame Curie Bioscience Database [Internet]. Austin (TX): Landes Bioscience; 2000-2013.
Introduction
With over 800 genome sequences available and thousands more in the pipeline (see Genome Online Database for latest number updates1 www.genomeonline.org), the genetic information used by most biologists/biochemists is now derived mainly from genomic sequences that have been annotated in silico. Functional inferences based on comparative sequence analysis are established foundations of genomic annotation. For well studied gene families, in which the initial annotation has been experimentally verified, these homology-based methods are quite accurate in predicting function.2 However, factors such as low sequence similarity,2 multi-domain proteins,3 gene duplications2,4 and non-orthologous displacements5 have all contributed to incorrect or absent annotations. This has been a major problem in the field of RNA modification enzymes because many are members of large paralogous families and transferring functional annotations using BLAST scores alone can be very dangerous, particularly between kingdoms. Cases where the closest homologs in two genomes do not catalyze the same reaction are numerous in the RNA modification field with the added complication of having both tRNA and rRNA and/or snRNA as potential substrates (see 6-10 for specific examples).
The complexity in the annotation of RNA modification genes is such that the identification of the complete set of modification genes in a given genome requires thorough and extensive analysis, a process that to date has been limited to only a few organisms.11-14 Even these analyses are not complete and many RNA modification genes are still missing in the most extensively studied models. Databases such as Modomics (see Appendix 1 by Rother et al) provide a reference for compiling all the modification pathways and corresponding enzymes but are not designed to easily access the corresponding genetic information. In order to make the link betwen the modifications and the corresponding genes, we and other members of the SEED community have started to encode the RNA modification and processing genes in the SEED database.
Content of the SEED Database
The SEED database (http://theseed.uchicago.edu/FIG/index.cgi) has developed a series of tools to improve annotations based on the subsystem (SS) approach.15 A subsystem is a set of functional roles that together specify a biological process. It can be thought of as a generalization of the term pathway but is more flexible in that it can be completely defined by the database user. To build a SS an expert in a given field, in this case RNA modification, first defines a list of functional roles relevant to the system, then links a given functional role to a specific gene for an experimentally validated case; this will populate the subsytem in a few genomes. To expand to more genomes the expert annotator has to propagate the annotations to the putative orthologous sequences. This propagation can be done using BlastP scores if there are no ambiguities. However, in most cases additional types of information such as physical clustering, biological relevance, phylogenetic analysis, or motif searches are required before transfering an annotation to a homologous gene.15 The SEED database has also proven to be a powerful platform to identify many of the missing RNA modification genes by comparative genomics methods.16
Table 1 lists the SS related to RNA modification that has been implemented in SEED to date. This effort is still a work in progress. Some SSs like the “Queuosine and Archaeosine biosynthesis” subsystem are complete with extensive notes and diagrams and the functional roles have been propagated to all the available genomes (>500) and was used to illustrate the format of a SS (Fig. 1). Some are less advanced, such as the “tRNA modification yeast cytoplasmic” SS, which has been populated with all the known genes in S. cerevisiae, but which has not yet been propagated to any other genomes. These subsystems are working documents that will have evolved by the time the reader will access them and should be viewed as a platform that can be used to help correctly annotate the RNA modification genes in more sequenced genomes.
Case Study
The type of bioinformatic analysis that can be done once a SS is in place is illustrated below with the analysis of queuosine biosynthesis in Clostridia. Queuosine is a complex modification derived from GTP that is found in most bacteria. For a detailed description of the pathway see chapter by Iwata-Reuyl and de Crécy-Lagard of this volume. In short, the protein families GTP cyclohydrolase I and QueCDE are involved in the formation of the precursor preQ0, which is then reduced to PreQ1 by QueF. PreQ1 is introduced into the tRNA by TGT, then further modified by QueA (see Appendix I by Rother et al. for abbreviations). In the queuosine SS all the known and predicted genes involved in the biosynthesis have been included (Fig. 1A). One can then analyze the distribution of these genes in a all genomes or in a specific set as shown in Figure 1B where the focus is on the sequenced Clostridiaceae genomes. Several observations and predictions can be made from such an analysis. All Clostridiaceae have homologs of the tgt and queA genes and must therefore have Q in their tRNAs. However it appears that many have or are in the process of loosing the precursor pathway. Only one Clostridiaceae has the intact PreQ1 precursor pathway, Clostridium beijerinckii and these genes are all clustered on the chromosome (Fig. 1B). In others, such as Clostridium botulinum, the preQ0 biosynthetic genes are present but queF is not. This suggests that there is a missing queF gene (e.g., an non-orthogous gene displacement) and/or that PreQ1 is transported, but we can not discriminate at this stage between the two hypothesis. Clearly in most Clostridiaceae PreQ1 must be salvaged as all the precursor genes are missing and a putative preQ0/PreQ1 transporter gene can be identified under the control of a PreQ1 riboswitch.17,18 Interstingly, in Alkaliphilus metalliredigens, homologs of queCDE are absent but the queF gene homolog is present and is clustered with the putative transporter, suggesting that in this case preQ0 is salvaged. This type of analysis informs us on the pathway in a specific organism and allows one to make predictions that can be tested experimentally.
Conclusion
In summary the RNA modification subsystems in the SEED database are both depositories of current knowledge in the field and a workbench that allows us to exlore the diversity of the RNA modification machinery along the tree of life.
Ackowledgements
This work was supported by the National Science foundation (MCB-05169448) and by the National Institutes of Health (R01 GM70641-01). VDC would like the thank Henri Grosjean for guiding her initial steps in the world of RNA modifications and Dirk Iwata-Reuyl for his input on the manuscript.
References
- 1.
- Liolios K, Mavromatis K, Tavernarakis N. et al. The genomes on line database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucl Acids Res. 2008;36:D475–479. [PMC free article: PMC2238992] [PubMed: 17981842]
- 2.
- Tian W, Skolnick J. How well is enzyme function conserved as a function of pairwise sequence identity? J Mol Biol. 2003;333:863–882. [PubMed: 14568541]
- 3.
- Hegyi H, Gerstein M. Annotation transfer for genomics: measuring functional divergence in multi-domain proteins. Genome Res. 2001;11:1632–1640. [PMC free article: PMC311165] [PubMed: 11591640]
- 4.
- Gerlt JA, Babbitt PC. Can sequence determine function? Genome Biol 20001 Reviews0005 . [PMC free article: PMC138884] [PubMed: 11178260]
- 5.
- Galperin MY, Koonin EV. Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement and operon disruption. In Silico Biol. 1998;1:55–67. [PubMed: 11471243]
- 6.
- Bujnicki JM, Feder M, Ayres CL. et al. Sequence-structure-function studies of tRNA:m5C methyltransferase Trm4p and its relationship to DNA:m5C and RNA:m5U methyltransferases. Nucl Acids Res. 2004;32:2453–2463. [PMC free article: PMC419452] [PubMed: 15121902]
- 7.
- Behm-Ansmant I, Urban A, Ma X. et al. The Saccharomyces cerevisiae U2 snRNA:pseudouridine-synthase Pus7p is a novel multisite-multisubstrate RNA:Psi-synthase also acting on tRNAs. RNA. 2003;9:1371–1382. [PMC free article: PMC1287059] [PubMed: 14561887]
- 8.
- Xing F, Hiley SL, Hughes TR. et al. The specificities of four yeast dihydrouridine-synthases for cytoplasmic tRNAs. J Biol Chem. 2004;279:17850–17860. [PubMed: 14970222]
- 9.
- Jeltsch A, Nellen W, Lyko F. Two substrates are better than one: dual specificities for Dnmt2 methyltransferases. Trends Bioch Sci. 2006;31:306. [PubMed: 16679017]
- 10.
- Massenet S, Motorin Y, Lafontaine DLJ. et al. Pseudouridine mapping in the saccharomyces cerevisiae spliceosomal U small nuclear RNAs (snRNAs) reveals that pseudouridine synthase Pus1p exhibits a dual substrate specificity for U2 snRNA and tRNA. Mol Cell Biol. 1999;19:2142–2154. [PMC free article: PMC84007] [PubMed: 10022901]
- 11.
- de Crécy-Lagard V, Marck C, Brochier-Armanet C. et al. Comparative RNomics and modomics in mollicutes: prediction of gene function and evolutionary implications. IUBMB Life. 2007:1–25. [PubMed: 17852564]
- 12.
- Johansson MJ, Byström AS. Transfer RNA modifications and modifying enzymes in S. cerevisiae. In: Grosjean H, ed. Fine-tuning of RNA functions by modification and editing, topics in current genetics. Berlin-Heidelberg: Springer-Verlag. 2005;12:87–120.
- 13.
- Grosjean H, Marck C, Gaspin C. et al. RNomics and modomics in the halophilic archaea haloferax volcanii: identification of RNA modification genes. BMC Genomics. 2008;9:470. [PMC free article: PMC2584109] [PubMed: 18844986]
- 14.
- Andersen NM, Douthwaite S. YebU is a m5C methyltransferase specific for 16 S rRNA nucleotide 1407. J Mol Biol. 2006;359:777–786. [PubMed: 16678201]
- 15.
- Overbeek R, Begley T, Butler RM. et al. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005;33:5691–5702. [PMC free article: PMC1251668] [PubMed: 16214803]
- 16.
- de Crécy-Lagard V. Identification of genes encoding tRNA modification enzymes by comparative genomics. Methods Enzymol. 2007;425:153–183. [PMC free article: PMC3034448] [PubMed: 17673083]
- 17.
- Roth A, Winkler WC, Regulski EE. et al. A riboswitch selective for the queuosine precursor preQ1 contains an unusually small aptamer domain. Nat Struct Mol Biol. 2007;14:308. [PubMed: 17384645]
- 18.
- Meyer MM, Roth A, Chervin SM. et al. Confirmation of a second natural preQ1 aptamer class in Streptococcaceae bacteria. RNA. 2008;14:685–695. [PMC free article: PMC2271366] [PubMed: 18305186]
- RNA Modification Subsystems in the SEED Database - Madame Curie Bioscience Datab...RNA Modification Subsystems in the SEED Database - Madame Curie Bioscience Database
- Origin, Recognition, Signaling and Repair of DNA Double-Strand Breaks in Mammali...Origin, Recognition, Signaling and Repair of DNA Double-Strand Breaks in Mammalian Cells - Madame Curie Bioscience Database
- Protein Misassembly: Macromolecular Crowding and Molecular Chaperones - Madame C...Protein Misassembly: Macromolecular Crowding and Molecular Chaperones - Madame Curie Bioscience Database
- p53's Dilemma in Transcription: Analysis by Microarrays - Madame Curie Bioscienc...p53's Dilemma in Transcription: Analysis by Microarrays - Madame Curie Bioscience Database
- Genetics, Mutations, and Polymorphisms - Madame Curie Bioscience DatabaseGenetics, Mutations, and Polymorphisms - Madame Curie Bioscience Database
Your browsing activity is empty.
Activity recording is turned off.
See more...