Big Data Mining and Analytics


alternative splicing, intron retention, gene expression, RNA-Seq


Intron Retention (IR) is an alternative splicing mode through which introns are retained in mature RNAs rather than being spliced in most cases. IR has been gaining increasing attention in recent years because of its recognized association with gene expression regulation and complex diseases. Continuous efforts have been dedicated to the development of IR detection methods. These methods differ in their metrics to quantify retention propensity, performance to detect IR events, functional enrichment of detected IRs, and computational speed. A systematic experimental comparison would be valuable to the selection and use of existing methods. In this work, we conduct an experimental comparison of existing IR detection methods. Considering the unavailability of a gold standard dataset of intron retention, we compare the IR detection performance on simulation datasets. Then, we compare the IR detection results with real RNA-Seq data. We also describe the use of differential analysis methods to identify disease-associated IRs and compare differential IRs along with their Gene Ontology enrichment, which is illustrated on an Alzheimer’s disease RNA-Seq dataset. We discuss key principles and features of existing approaches and outline their differences. This systematic analysis provides helpful guidance for interrogating transcriptomic data from the point of view of IR.