CAP-miRSeq: A comprehensive analysis pipeline for deep microRNA sequencing
miRNAs play a key role in normal physiology and various diseases such as cancer. Hybridization based microarray technology has been used for miRNA profiling, but is hindered by its narrow detection range, more susceptibility to technical variation, and lack of ability to characterize novel miRNAs and sequence variation. miRNA profiling through next generation sequencing overcomes those limitations and provides a new avenue for biomarker discovery and clinical applications.
However, analyzing miRNA sequencing data is challenging. Significant computational resources and bioinformatics expertise are needed. Several analytical tools have been developed over the past few years; however most of these tools are web-based and can only process one or a pair of samples at time, which is not suitable for a large scale study with tens or even hundreds of samples. Lack of flexibility and reliability of the web service (such as outdated references, unknown parameters used, server down, and slow performance) are also common issues. Although some tools provide differential miRNA analysis, they either limit to a pair of samples or use a model not suitable to a study design. Moreover, miRNA SNVs or mutations become increasingly important but none of the tools provide SNV/mutation detection.
Herein, we present a comprehensive analysis pipeline for deep microRNA sequencing (CAP-miRSeq) that integrates read preprocessing, alignment, mature/precursor/novel miRNA qualification, variant detection in miRNA coding region, and flexible differential expression between experimental conditions. According to computational infrastructures, users can run samples sequentially or in parallel for fast processing. In either a case, summary and expression reports for all samples are generated for easier quality assessment and downstream analyses. Using well characterized data, we demonstrated the pipeline’s superior performances, flexibilities, and practical use in research and biomarker discovery.
The workflow can be configured to run on a single Linux machine as well as in a cluster environment to fully leverage multiple processors.
- User Manual (.pdf)
- Quick Start Virtual Machine (2.2GB, requires free Virtual Box software and a computer with greater than 4GB of memory)
- Amazon EC2 AMI (medium instance recommended)
- Source, includes chromosome 1 reference files and sample data (.tar.gz)
Authors: Dr. Zhifu Sun, Jared Evans, Aditya Bhagwate, Sumit Middha, Matthew Bockol, Dr. Huihuang Yan, Dr. Jean-Pierre A. Kocher.
Citation: Sun Z, Evans J, Bhagwate A, Middha S, Bockol M, Yan H, and Kocher JP: CAP-miRSeq: a comprehensive analysis pipeline for microRNA sequencing data. BMC Genomics 2014 15:423.
Page last modified: February 20, 2015