MAP-RSeq: The Mayo Analysis Pipeline for RNA Seq: A comprehensive system for RNA-Sequencing data analysis
RNA-Sequencing (RNA-Seq) technology is information-rich; the breadth of information gained spans from large structural changes to single nucleotide variants (SNVs). By efficiently analyzing RNA-Seq data, we can query and obtain a variety of genomic features, such as gene expression, novel and fusion transcripts, alternative splice sites, long non-coding and circular RNAs, SNVs, etc. Most RNA-Seq bioinformatics tools output one or two genomic features for downstream analysis, but so far there have been no comprehensive workflows that can be used to obtain a number of features from RNA-Seq data. To address this shortfall, Mayo Clinic has developed MAP-RSeq – a computational workflow that leverages data from an RNA-Seq experiment to provide comprehensive reports on genomic features for secondary data analysis.
The MAP-RSeq workflow integrates a suite of open source bioinformatics tools along with in-house developed methods to analyze paired-end RNA-Seq data. Read alignment is performed with Tophat which uses Bowtie – a fast, memory efficient, short sequence aligner. Tophat aligns reads to the transcriptome and further to the genome to report both existing and novel junctions. Along with the alignment (BAM) and junction (BED) files, Tophat also provides a list of expressed fusion transcripts using the TopHat-Fusion algorithm. The BAM file is processed using HTSeq to summarize expression at gene level. Exon quantification is obtained with in-house methods that leverage BEDTools. In addition to raw gene and exon expression counts, MAP-RSeq also provides normalized values (RPKM). For accurate variant detection, GATK is used to call SNVs that are further annotated with quality score, coverage and additional criteria using VQSR.
MAP-RSeq reports several analytical functions, including alignment statistics, in-depth quality control metrics, gene and exon expression levels, fusion transcripts and SNVs for each sample. Circos plots are also provided to visualize fusion transcripts. MAP-RSeq incorporates Integrated Genomics Viewer (IGV) to visualize alignment and coverage along the transcriptome as well as exon-exon junctions. MAP-RSeq incorporates UCSC tracks on transcription and regulation in IGV as well to facilitate user interpretation.
The workflow can be configured to run on a single Linux machine as well as in a cluster environment to fully leverage multiple processors.
- Source (.tgz, includes reference files, 1GB)
- User Manual (.docx)
- Quick Start Virtual Machine (2.2GB, requires free Virtual Box software and a computer with greater than 4GB of memory)
- Example Result Set (.zip, please download and unpack prior to reviewing the data)
If you have any issues with the software please contact Matthew Bockol at firstname.lastname@example.org .
Authors: Kalari, Krishna R.; Nair, Asha A.; Bhavsar, Jaysheel D.; Middha, Sumit; Bockol, Matthew; Kalmbach, Michael T; Tang, Xiaojia; Davila, Jaime; Nie, Jinfu; O’Brien, Daniel R.; Kocher, Jean-Pierre
Page last modified: October 24, 2014