Taxonomy and Sequence Relatedness of Retroviruses

Publication Details

Retroviruses are related to other present day viruses and transposable elements that use reverse transcription for their propagation (Chapter 8). All retroviruses are grouped under one taxonomic unit, the family Retroviridae (Murphy et al. 1995). The most recent conventions (Coffin 1992) (Chapter 1) distinguish seven major genera in this family: mammalian C-type viruses (prototype MLV), avian C-type viruses (the ASLV, prototype RSV), B-type viruses (prototype MMTV), D-type viruses (prototype M-PMV), lentiviruses (prototype HIV-1), viruses of the HTLV/BLV group (prototype HTLV-1), and spumaviruses (prototype HFV). At least two additional genera are needed to include recently characterized retroviruses of flies (Kim et al. 1994; Song et al. 1994) and fish (Holzschu et al. 1995). An unknown number of additional genera will be required to accommodate the many retroviruses that have not been characterized in molecular terms or that remain to be discovered. Viruses are assigned to these groups mainly by sequence relatedness of the reverse transcriptases (Doolittle et al. 1990; Xiong and Eickbush 1990), but other criteria, including morphology (shape of core), site of assembly of the core (plasma membrane or cytoplasm), and presence or absence of accessory genes, are used as well (Fig. 6). Retroviruses in different genera show little sequence similarity at the nucleotide level, as measured experimentally by standard biochemical techniques, and only limited amino acid similarity except in reverse transcriptase. In the approximately 175 amino acid residues that constitute the most conserved part of reverse transcriptase, viruses in different genera show identity at about one third to two thirds of the residues. Generally, viruses within a genus show identity at more than two thirds of these residues. However, there are exceptions to these rules, reflecting the imperfections of the classification system, restricted sampling of existing retroviruses, and perhaps unknown evolutionary constraints in some genera. For example, there are some members of the lentiviral genus with less similarity in the reverse transcriptase sequence than suggested by these approximations.

In some cases, the names of a virus may erroneously suggest a close relationship to another virus. For example, because of their names, one might make the incorrect assumption that HIV-1 and HIV-2 are strains of the same species. In fact, in typical HIV-1 and HIV-2 isolates, the Gag protein may show identity at only one half of the residues, which is less similarity than that between HIV-2 and certain other primate lentiviruses. Since viruses can evolve with enormous rapidity under selective pressure, it is difficult to define by sequence precisely what a virus “species” is. For retroviral isolates to be considered as the same “species,” more than 80–90% of all nucleotide sequences should be identical or more than 90% of amino acid sequences in proteins encoded by genes other than pro and pol should be identical, and the viruses should infect the same host.

Despite the lack of nucleotide similarities among viruses in different genera, there are a few conserved amino acid sequence motifs in retroviruses. Examples of such motifs encoded in the gag, pro, and pol genes of prototypic retroviral genomes are shown diagrammatically for Gag in Figure 7 and for Pro and Pol in Figure 8. Many or most Gag proteins contain the four-residue Pro-Pro-Pro-Tyr sequence, a sequence of 20 residues in CA called the major homology region (MHR), and a characteristic spacing of cysteine and histidine residues in the nucleocapsid (see below). These three elements represent the only obvious similarities in Gag proteins of different genera. The protease, encoded by pro, has an invariant stretch of several amino acids at its active site, and at least two other segments of conserved residues as well. In the integrase portion of pol, the spacing of three key acidic residues is a signature for this protein in several genera. Integrase also contains a characteristic His-Cys motif that is different from the motif in nucleocapsid. The reverse transcriptase portion of pol shows the most sequence conservation, as discussed above, and portions of this sequence are recognizably similar not only in retroviruses, but also in other genetic elements that use reverse transcription for their propagation, such as retrotransposons, group II introns, and bacterial retrons (Chapter 8). Among the most conserved reverse transcriptase sequences is the YMDD motif at the active site.

Despite the paucity of sequence similarities among retroviruses and related elements, structural analysis of the few proteins that have been studied, such as the proteases of ASLV and HIV (Wlodawer et al. 1989) and the integrases of ASLV, HIV, and bacteriophage Mu (Dyda et al. 1994; Rice and Mizuuchi 1995) (see Chapters 5 and 7), reveals striking similarity in their overall organization. It would be surprising if this structural similarity in the absence of sequence identity did not extend to all common proteins.