UCLncR Pipeline
Long non-coding RNA (lncRNA) is a new and large class of gene transcripts with regulatory function and is involved in many biological functions and pathological processes. They can be potential new diagnostic markers and therapeutic targets. Although thousands of lncRNAs have been characterized in recent years, it is expected that many more will be discovered because of their tissue or even more disease specific expression. Discovering novel lncRNAs and accurately quantifying known lncRNAs is increasingly important for biomarker discovery and translational research. However, it is not a trivial task as they need to be shifted from massive amount of RNA-seq data with many steps and intensive computational power. Although some tools are available for some of the tasks, there has been no convenient public tool to perform all the tasks seamlessly, particularly for a large study with many RNA-seq samples. To address the needs, we have developed an Ultrafast and Comprehensive lncRNA detection pipeline (short for “UClncR”), which takes advantage of fast transcript assembly and parallel computing tools; utilizes multi-step filters for increased specificity; and provides comprehensive lncRNA characterization. The workflow takes standard RNA-seq alignment file (either from HISAT2, Tophat, or STAR), performs transcript assembly, predicts novel lncRNA, quantifies and annotates both known and novel lncRNAs, and generates a convenient report for downstream analysis. The pipeline accommodates both un-stranded and stranded RNA-seq so that lncRNAs overlapping with other genes can be predicted and quantified. UClncR is fully parallelized in a cluster environment yet allows users to run samples sequentially without a cluster. The pipeline can process a typical RNA-seq sample in a matter of minutes and complete hundreds of samples in several hours. Analysis of predicted lncRNAs from test data demonstrated UClncR’s accuracy and the biological relevance of these lncRNAs.
- Quick Start Virtual Machine (11.6 GB, OVA image, requires free Virtual Box software and a computer with greater than 3GB of memory)
- Source package with hg19 reference conservation score (UClncR.v1.1.1.tar.gz, UClncR.v1.0.1.tar.gz)
- Test sample (18 GB, ENCFF782IVX_std.bam)
- User Manual (v1.1.1, v1.0.1))
Questions: Please contact chen.xianfeng@mayo.edu
Page last modified: November 17, 2023