Software Packages

Department of Quantitative Health Sciences
Mayo Clinic Research
Formerly known as the Department of Health Sciences Research

Related links: Division Overview R Shiny Applications


CAP-miRSeq: A comprehensive analysis pipeline for deep microRNA sequencing

miRNAs play a key role in normal physiology and various diseases such as cancer. Hybridization based microarray technology has been used for miRNA profiling, but is hindered by its narrow detection range, more susceptibility to technical variation, and lack of ability to characterize novel miRNAs and sequence variation. miRNA profiling through next generation sequencing overcomes those limitations and provides a new avenue for biomarker discovery and clinical applications.

However, analyzing miRNA sequencing data is challenging. Significant computational resources and bioinformatics expertise are needed. Several analytical tools have been developed over the past few years; however most of these tools are web-based and can only process one or a pair of samples at time, which is not suitable for a large scale study with tens or even hundreds of samples. Lack of flexibility and reliability of the web service (such as outdated references, unknown parameters used, server down, and slow performance) are also common issues. Although some tools provide differential miRNA analysis, they either limit to a pair of samples or use a model not suitable to a study design. Moreover, miRNA SNVs or mutations become increasingly important but none of the tools provide SNV/mutation detection.

Herein, we present a comprehensive analysis pipeline for deep microRNA sequencing (CAP-miRSeq) that integrates read preprocessing, alignment, mature/precursor/novel miRNA qualification, variant detection in miRNA coding region, and flexible differential expression between experimental conditions. According to computational infrastructures, users can run samples sequentially or in parallel for fast processing. In either a case, summary and expression reports for all samples are generated for easier quality assessment and downstream analyses. Using well characterized data, we demonstrated the pipeline’s superior performances, flexibilities, and practical use in research and biomarker discovery.

The workflow can be configured to run on a single Linux machine as well as in a cluster environment to fully leverage multiple processors.

Authors: Dr. Zhifu Sun, Jared Evans, Aditya Bhagwate, Sumit Middha, Matthew Bockol, Dr. Huihuang Yan, Dr. Jean-Pierre A. Kocher.

Citation: Sun Z, Evans J, Bhagwate A, Middha S, Bockol M, Yan H, and Kocher JP: CAP-miRSeq: a comprehensive analysis pipeline for microRNA sequencing data. BMC Genomics 2014 15:423.

For assistance, please contact Jared Evans ( ) or Matthew Bockol ( ).

Page last modified: August 10, 2018