Software Packages

Department of Quantitative Health Sciences
Mayo Clinic Research
Formerly known as the Department of Health Sciences Research

Related links: Division Overview R Shiny Applications
Suggested Searches...
All | Alignment | Assemblers | ChIP | Copy Number Variants | Exome | Mate Pair | Methylation | Microbiome | Pathway Analysis | Proteins | RNA | SNP/SNV | Structural Variants | Visualization | R Packages | SAS Macros | Survival Analysis


armitage trend test for trait and SNP dosage Authors: Jason Sinnwell (primary contact) Dan Schaid Link: armitage_0.2.1.tar.gz Language/Platform: R


Generalized Estimating Equations for Affected Relative Pairs Authors: Dan Schaid Jason Sinnwell Link: arp.gee_0.1.1.tar.gz Language/Platform: R


An Arsenal of ‘R’ Functions for Large-Scale Statistical Summaries An Arsenal of ‘R’ functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in ‘R’ and ‘RStudio’ and which use formulas and versatile summary statistics for summary tables and models. The primary functions include tableby(), a Table-1-like summary of multiple […]


Population Attributable Risk Estimates population (etiological) attributable risk for unmatched, pair-matched or set-matched case-control designs and returns a list containing the estimated attributable risk, estimates of coefficients, and their standard errors, from the (conditional, If necessary) logistic regression used for estimating the relative risk. Authors: Beth Atkinson (primary contact) Louis Schenck Cindy Crowson Terry Therneau […]


Routines for Block Diagonal Symmetric matrices This is a special case of sparse matrices, used by coxme Authors: Terry Therneau Available at: Language/Platform: R

bilinear regression to 16O/18O isotope label experiments Authors: Doug Mahoney (primary contact) Jeanette Eckel-Passow Available at: Language/Platform: R


A mapping/alignment customized for mate-pair library next generation sequencing.

BioR Toolkit – Old Versions

BioR Toolkit – Old Versions Warning! These versions contain a critical tabix-related bug that cause a small percentage of regions to “miss” when using bior_overlap and bior_same_variant against some catalogs. Please use one of the fixed versions HERE. These old versions are maintained here for archive and re-creation purposes only, and should NOT be used […]

BioR: Rapid, Flexible System for Genomic Annotation

A toolkit and set of catalogs to retrieve genomic annotation for variants, genes, diseases, conditions, genetic tests, and drugs.


Exact confidence intervals for a proportion.


Select bootstrap samples.


miRNAs play a key role in normal physiology and various diseases such as cancer. However, analyzing miRNA sequencing data is challenging due to the requirement of significant computational resources and bioinformatics expertise. To address this, we present a comprehensive analysis pipeline for deep microRNA sequencing (CAP-miRSeq) that integrates read preprocessing, alignment, mature/precursor/novel miRNA qualification, variant detection in miRNA coding region, and flexible differential expression between experimental conditions. Using well characterized data, we demonstrated the pipeline’s superior performances, flexibilities, and practical use in research and biomarker discovery.


ChIP-RNA-seqPRO is a resource motivated by this current need and provides a strategy that enables the user to profile regulatory associations between epigenomic modifications and co/post-transcriptional processes.


Circ-Seq: A comprehensive bioinformatics workflow for detecting circular RNAs Circular RNAs (circRNAs) are recently discovered members of the noncoding RNA family that range in length from a few hundred to thousands of nucleotides. In contrast to linear RNA transcripts, which are normally spliced tail-to-head, circRNAs are formed by the covalent bonding of their 3´ and […]


Structural Variations (SVs) and Copy Number Variations (CNVs) are the major source of genomic variations. CNVnator is a tool for Copy Number Variation (CNV) discovery and genotyping  from depth-of-coverage by mapped reads.  It accepts .bam files as input and generates CNVs calls in less than 10 hours of calculations. The source code and extended descriptions […]


Cumulative incidence in the presence of competing risks.


Mixed Effects Cox Models Cox proportional hazards models containing Gaussian random effects, also known as frailty models. Authors: Terry Therneau Available at: Language/Platform: R


Competing risk survival analysis with covariates.


Deming, Thiel-Sen and Passing-Bablock Regression Generalized Deming regression, Theil-Sen regression and Passing-Bablock regression functions. Authors: Terry Therneau Available at: Language/Platform: R


Estimates the distance matrix between two groups (e.g. cases and potential controls) on the basis of a set of X’s.


eSNV-Detect v1.0: Reliable Identification of Variants Using RNA-seq Data


A workflow for detecting viral integrations and viral presence from sequencing data.


Ezimputer is an impute2-based genotype imputation workflow that greatly simplifies the process of imputation and achieves a significant speedup of imputation using multiple CPUs on a computer cluster.


Fast Loess Authors: Doug Mahoney Jeanette Eckel-Passow Ann Oberb Link: fastlo_1.3.tar.gz Language/Platform: R


Uses the method of Contal and O’Quigley (1999) to find the best cutpoint in a continuous variable with regards to a survival outcome.


A tool used to calculate the estimated sensitivity of fusion finding for an RNA-seq experiment. It plots the estimated sensitivity as a function of the distance to the 3’ end and also calculates the decay rate for the sample.


GeneSetScan is a pre-compiled binary for 64-bit linux systems. It offers a general approach to scan genome-wide SNP data for gene-set association analyses.


GenomeSmasher is a set of tools used to create diploid FASTA files with containing snps, indels, duplications, deletions and translocations.


Computerized matching of cases to controls using the greedy matching algorithm


Required Elements Name of project/tool Short description of project/tool (1-3 sentences) Authors, primary contact Suggested Tags Link to source code & data (tarred/gzipped if large) Other elements you may want to include Date last updated (?) Longer Description User manual Links to Publications for the software System requirements Licensing information If you want to deploy an […]


Statistical Analysis of Haplotypes with Traits and Covariates when Linkage Phase is Ambiguous This software offers a suite of R routines for the analysis of indirectly measured haplotypes. The statistical methods assume that all subjects are unrelated and that haplotypes are ambiguous (because of unknown linkage phase of the genetic markers). The genetic markers are […]


HGT-ID v1.0: An efficient and sensitive program for detecting viral insertion sequences in the genome of human cancers

HiChIP Pipeline

HiChIP: A high-throughput pipeline for integrative analysis of ChIP-Seq data HiChIP pipeline is designed for performing comprehensive analysis of chromatin immunoprecipitation and sequencing (ChIP-Seq) data. It can be used to analyze profiles from transcription factor binding, histone modifications, histone variants, and chromatin regulators. Paired-end and single-end NexGen sequencing data from ChIP experiment with different antibodies, […]


Hardy-Weinberg Equilibrium Tests Test the fit of genotype frequencies to Hardy-Weinberg Equilibrium proportions for autosomes and the X chromosome. Different statistical tests are provided, as well as an option to evaluate statistical significance by either exact methods or simulations Authors: Jason Sinnwell (primary contact) Dan Schaid Dan Folie Link: hwe_0.3.1.tar.gz   Language/Platform: R


Microbiota pipeline that utilizes and integrates information from a mix of both paired-end and single-end reads.


Regression Methods for IBD Linkage With Covariates A method to test genetic linkage with covariates by regression methods with response IBD sharing for relative pairs. Account for correlations of IBD statistics and covariates for relative pairs within the same pedigree. Authors: Jason Sinnwell (primary contact) Dan Schaid Available at: Language/Platform: R


ICQ-lincRNA (Identification, Characterization, and Quantification of Long Intergenic Non-Coding RNAs), offers an end-to-end solution to identify and annotate expressed lincRNAs in next generation RNA sequencing data.  Specifically, ICQ-lincRNA: Conducts ab-initio genome-wide transcript assembly by both Cufflinks and Scripture using Binary Alignment/Map (BAM) files Conducts downstream quantitative analyses including gene count, exon count, overlap with known […]


Produce gplot of continuous variable(y-axis) vs a group variable(x-axis) in such a way that no points are hidden.


Pedigree Functions {Pedigree Functions description} Authors: Jason Sinnwell (primary contact) Terry Therneau Beth Atkinson Dan Schaid Available at: Language/Platform: R


LD calculations on multi-allele, and SNP variants, including the composite-LD measure.


Univariate logistic regression model summaries with multiple dependent variables and predictors.


The MAP-RSeq workflow integrates a suite of open source bioinformatics tools along with in-house developed methods to analyze paired-end RNA-Seq data.


Computes Lin’s concordance correlation coefficient (CCC) for any number of raters.


Check Pedigrees for Mendelian Errors Check Pedigrees for Mendelian Errors and, when errors are found, systematically jackknifes every typed pedigree member to determine if eliminating this member will remove all Mendelian Errors from the pedigree Authors: Jason Sinnwell (primary contact) Dan Schaid Dan Folie Link: mend.err_1.3.tar.gz Language/Platform: R


Quantitative Linkage Analysis Tools using the Variance Components Approach Calculate the polygenic and major gene models for quantitative trait linkage analysis using the variance components approach. Authors: Pat Votruba (primary contact) Beth Atkinson Mariza de Andrade Available at: Language/Platform: R


Conducts likelihood ratio tests for nested logistic and Cox proportional hazards models.


Uses Graph Template Language to create a highly customizabile Kaplan-Meier curve.


This macro creates a macro variable containing the number of observations in a SAS dataset.


Creates a single RTF file containing multiple tables created by %SUMMARY.


The PANDA (Pathway AND Annotation) Explorer is a data visualization tool capable of annotating genes with any data type and graphically displaying the result within the context of pathways.


PANOPLY, a novel computational approach to integrate both germline and somatic data obtained from multi-omics platforms for an individual of interest and analyze that data in the context matched-control samples.


A versatile tool for detecting copy number changes from exome sequencing data.


Gene-Level Statistics for Pedigree Data Gene-level association tests with disease status for pedigree data: kernel and burden association statistics. Authors: Jason Sinnwell (primary contact) Dan Schaid Available at: Language/Platform: R


Pleiotropy Test for Multiple Traits on a Genetic Marker Perform tests for pleiotropy of multiple traits of various variable types on genotypes for a genetic marker. Authors: Jason Sinnwell (primary contact) Dan Schaid Available at: Language/Platform: R


Proc Plot with correlation/regression statistics appended.


Create a scatterplot matrix graphically displaying the bivariate relationships between a number of variables.

Protein Panoramic annoTation Tool (P2T2)

P2T2 is a web-based platform for the annotation of proteins using population variants; experimentally determined functional and phenotype-associated variants; literature-mined variant-phenotype relationships; and structural bioinformatics features such as linear motifs, domains and experimental structures.


Sample size for RNAseq studies {Short description of project/tool (1-3 sentences)} Authors: Terry Therneau (primary contact) Steven Hart JP Kocher Available at: Publication: Language/Platform: R  


Estimated Integrated Discrimination Index (IDI) and Net Reclassification Improvement (NRI) for comparison of a new risk model to an old model.


Recursive Partitioning and Regression Trees Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone. Authors: Beth Atkinson Terry Therneau Brian Ripley (primary contact) Available at: Language/Platform: R


RVboost v0.1: RNA-seq variant prioritization approach for Illumina next-generation sequencing data.


The Streamlined Analysis and Annotation Pipeline for RRBS data (SAAP-RRBS) integrates read quality assessment/clean-up, alignment, methylation data extraction, annotation, reporting, and visualization. With this package, bioinformaticians or investigators can submit sequencing reads and quickly receive a fully annotated CpG methylation report.


Schoenfeld residuals for proportional hazards model.


Sequential Multiple Assignment Randomized Trial (SMART) design includes multiple stages of randomization, where participants are randomized to an initial treatment in the first stage and then subsequently re-randomized between treatments in the following stage. Includes methods for mean and variance as a function of specificity/sensitivity, and power calculations.


SnowShoes-FTD is a bioinformatics tool to identify fusion transcripts from paired-end transcriptome sequencing data.


A post-processor to optimize the selection of tag SNPs from common bin-tagging programs. SNPPicker uses a multi-step search strategy in combination with a statistical model to produce optimal genotyping panels.


SoftSearch is a sensitive structural variant (SV) detection tool for Illumina paired-end next-generation sequencing data.


SpatialNorm6 Spatial normalization of Affymetrix SNP 6.0 cel file, which adjusts for spatial bias based on wavelet decomposition Authors: Chai High Seng (primary contact) Link: SpatialNorm6_1.1.tar.gz  


Stress.dfArray Calculates normalization Stress and dfArray quality for a set of arrays. Authors: Doug Mahoney (primary contact) Jeanette Eckel-Passow Available at: Stress.dfArray_1.1.tar.gz Language/Platform: R


Creates a table of variable summaries plus test statistics for the difference between two or more independent samples.


Super Learner Prediction Implements the super learner prediction method and contains a library of prediction algorithms to be used in the super learner. Authors: Eric Polley (primary contact) LeDell van der Laan Kennedy Lendle Available at: Language/Platform: R


Complete Kaplan-Meier survival analysis with printing options and logrank statistic.


Calculates the c-statistic (concordance, discrimination index) for survived data with time dependent covariates


Survival Analysis Contains the core survival analysis routines, including definition of Surv objects, Kaplan-Meier and Aalen-Johansen (multi-state) curves, Cox models, and parametric accelerated failure time models. Authors: Terry Therneau (primary contact) Available at: Language/Platform: R  


Calculates logrank statatistics for the surv macro.


General survival statistics p(t), standard error, confidence limits, and median survival time, for the left-truncated survival analyses.


Creates high-quality and easily customized Kaplan-Meier plots.


Checks for symmetry and suggests the best power transformation, if one exists, to make an asymmetric distribution symmetric


ToxT is a combination of methods to analyze TOXicity data over Time. It can be used for adverse event data and for patient reported outcomes. It includes longitudinal mixed models, bug plots, time-to-event analyses, heatmaps, area under the curve analyses, profile analysis, comparisons at each time point, and plots over time.


trace-rrbs v0.1: Targeted Alignment and Artificial Cytosine Elimination for RRBS for Illumina next-generation sequencing data.


TREAT is a Targeted RE-sequencing Annotation Tool that offers a comprehensive, open framework, end-to-end solution for analyzing and interpreting targeted re-sequencing data.


trex Package that calculates a truncated exact test for two-stage case-control studies for rare genetic variants. The first stage is for screening rare variants in only cases. If the number of case-carriers of any rare variants exceeds a user-specified threshold, then additional cases and controls are genotyped for the detected variants and carrier status of […]


Measures agreement, precision, accuracy, total deviation index and coverage probability.

UCLncR Pipeline

The Ultrafast and Comprehensive lncRNA detection (UClncRNA) pipeline leverages fast transcript assembly and parallel computing tools, multi-step filters for increased specificity to provide comprehensive lncRNA characterization.


The Variant Call Format (VCF) is the de facto standard for storing variant information from next-generation DNA sequencing experiments.


Computerized matching of cases to controls using variable optimal matching.


Wandy: A program for CNV/Aneuploidy detection from WGS sequencing data   Introduction Wandy is designed for Copy Number Variation (CNV) and Aneuploidy detection from large genomes such as human. It takes a sorted BAM file as input and report predicted chromosome regions that have amplifications or deletions using LOG2 ratio, generate graphic reports. There are […]