Alternative cleavage and polyadenylation (APA), an RNA processing event, occurs in over 70% of human protein-coding genes. APA results in mRNA transcripts with distinct 3' ends, particularly found in 3' UTRs, which harbor regulatory elements that can impact mRNA stability, translation, and localization. APA can be profiled using a number of established computational tools that infer polyadenylation sites from standard RNA-seq datasets. Here, we benchmarked such cutting-edge short-read tools -- TAPAS, QAPA, DaPars2, GETUTR, and APATrap -- that take standard, short-read RNA-seq as input in their ability to identify polyadenylation sites and quantify polyadenylation site usage against 3'-Seq, a specialized RNA-seq protocol that enriches for reads at the 3' ends of genes, and Iso-Seq, a PacBio single-molecule full-length RNA-seq method. We demonstrate that 3'-Seq and Iso-Seq are able to identify and quantify the usage of polyadenylation sites more reliably than computational tools that use short-read RNA-seq as input. However, we find that running one such tool, QAPA, with a set of polyadenylation site annotations derived from small quantities of 3'-Seq or Iso-Seq can reliably quantify variation in APA across samples and genotypes, as demonstrated by the successful mapping of alternative polyadenylation quantitative trait loci (apaQTL). We envision that our analyses will shed light on the advantages of studying APA with more specialized sequencing protocols, such as 3'-Seq or Iso-Seq, and the limitations of studying APA with short-read RNA-seq. We provide a computational pipeline to aid in the identification of APA events using Iso-Seq data.
Less...