The Bioinformatics Program at Mayo Clinic has created several software packages to analyze, visualize and interpret genomic data. We welcome feedback and comments.
These applications are freely available to academics. Scientific acknowledgment and authorship information can be found in the section describing each application.
A toolkit and set of catalogs to retrieve genomic annotation for variants, genes, diseases, conditions, genetic tests, and drugs.
miRNAs play a key role in normal physiology and various diseases such as cancer. However, analyzing miRNA sequencing data is challenging due to the requirement of significant computational resources and bioinformatics expertise. To address this, we present a comprehensive analysis pipeline for deep microRNA sequencing (CAP-miRSeq) that integrates read preprocessing, alignment, mature/precursor/novel miRNA qualification, variant detection in miRNA coding region, and flexible differential expression between experimental conditions. Using well characterized data, we demonstrated the pipeline’s superior performances, flexibilities, and practical use in research and biomarker discovery.
ChIP-RNA-seqPRO is a resource motivated by this current need and provides a strategy that enables the user to profile regulatory associations between epigenomic modifications and co/post-transcriptional processes.
Structural Variations (SVs) and Copy Number Variations (CNVs) are the major source of genomic variations. CNVnator is a tool for Copy Number Variation (CNV) discovery and genotyping from depth-of-coverage by mapped reads. It accepts .bam files as input and generates CNVs calls in less than 10 hours of calculations. The source code and extended descriptions […]
Ezimputer is an impute2-based genotype imputation workflow that greatly simplifies the process of imputation and achieves a significant speedup of imputation using multiple CPUs on a computer cluster.
A tool used to calculate the estimated sensitivity of fusion finding for an RNA-seq experiment. It plots the estimated sensitivity as a function of the distance to the 3’ end and also calculates the decay rate for the sample.
GeneSetScan is a pre-compiled binary for 64-bit linux systems. It offers a general approach to scan genome-wide SNP data for gene-set association analyses.
GenomeSmasher is a set of tools used to create diploid FASTA files with containing snps, indels, duplications, deletions and translocations.
HGT-ID v1.0: An efficient and sensitive program for detecting viral insertion sequences in the genome of human cancers
HiChIP: A high-throughput pipeline for integrative analysis of ChIP-Seq data HiChIP pipeline is designed for performing comprehensive analysis of chromatin immunoprecipitation and sequencing (ChIP-Seq) data. It can be used to analyze profiles from transcription factor binding, histone modifications, histone variants, and chromatin regulators. Paired-end and single-end NexGen sequencing data from ChIP experiment with different antibodies, […]
Microbiota pipeline that utilizes and integrates information from a mix of both paired-end and single-end reads.
ICQ-lincRNA (Identification, Characterization, and Quantification of Long Intergenic Non-Coding RNAs), offers an end-to-end solution to identify and annotate expressed lincRNAs in next generation RNA sequencing data. Specifically, ICQ-lincRNA: Conducts ab-initio genome-wide transcript assembly by both Cufflinks and Scripture using Binary Alignment/Map (BAM) files Conducts downstream quantitative analyses including gene count, exon count, overlap with known […]
The MAP-RSeq workflow integrates a suite of open source bioinformatics tools along with in-house developed methods to analyze paired-end RNA-Seq data.
The PANDA (Pathway AND Annotation) Explorer is a data visualization tool capable of annotating genes with any data type and graphically displaying the result within the context of pathways.
PANOPLY- Precision Cancer Genomic Report: Single Sample Inventory Overview of the PANOPLY system: With the advent of high throughput technologies, the quantity of ‘omics’ data has rapidly increased, creating the need for methodologies that can analyze complex datasets and provide interpretations that assist in decision making. We have developed PANOPLY; a novel computational approach to […]
A versatile tool for detecting copy number changes from exome sequencing data.
P2T2 is a web-based platform for the annotation of proteins using population variants; experimentally determined functional and phenotype-associated variants; literature-mined variant-phenotype relationships; and structural bioinformatics features such as linear motifs, domains and experimental structures.
RVboost v0.1: RNA-seq variant prioritization approach for Illumina next-generation sequencing data.
The Streamlined Analysis and Annotation Pipeline for RRBS data (SAAP-RRBS) integrates read quality assessment/clean-up, alignment, methylation data extraction, annotation, reporting, and visualization. With this package, bioinformaticians or investigators can submit sequencing reads and quickly receive a fully annotated CpG methylation report.
SnowShoes-FTD is a bioinformatics tool to identify fusion transcripts from paired-end transcriptome sequencing data.
A post-processor to optimize the selection of tag SNPs from common bin-tagging programs. SNPPicker uses a multi-step search strategy in combination with a statistical model to produce optimal genotyping panels.
SoftSearch is a sensitive structural variant (SV) detection tool for Illumina paired-end next-generation sequencing data.
trace-rrbs v0.1: Targeted Alignment and Artificial Cytosine Elimination for RRBS for Illumina next-generation sequencing data.
TREAT is a Targeted RE-sequencing Annotation Tool that offers a comprehensive, open framework, end-to-end solution for analyzing and interpreting targeted re-sequencing data.
The Ultrafast and Comprehensive lncRNA detection (UClncRNA) pipeline leverages fast transcript assembly and parallel computing tools, multi-step filters for increased specificity to provide comprehensive lncRNA characterization.
The Variant Call Format (VCF) is the de facto standard for storing variant information from next-generation DNA sequencing experiments.
Wandy: A program for CNV/Aneuploidy detection from WGS sequencing data Wandy is designed for Copy Number Variation (CNV) and Aneuploidy detection from large genomes such as human. It takes a sorted BAM file as input and report predicted chromosome regions that have amplifications or deletions using LOG2 ratio, generate graphic reports. There are two download […]