Software Packages

Department of Quantitative Health Sciences
Mayo Clinic Research
Formerly known as the Department of Health Sciences Research

Related links: Division Overview R Shiny Applications
Suggested Searches...
All | Alignment | Assemblers | ChIP | Copy Number Variants | Exome | Mate Pair | Methylation | Microbiome | Pathway Analysis | Proteins | RNA | SNP/SNV | Structural Variants | Visualization | R Packages | SAS Macros | Survival Analysis

armitage

armitage trend test for trait and SNP dosage Authors: Jason Sinnwell (primary contact) Dan Schaid Link: armitage_0.2.1.tar.gz Language/Platform: R

arp.gee

Generalized Estimating Equations for Affected Relative Pairs Authors: Dan Schaid Jason Sinnwell Link: arp.gee_0.1.1.tar.gz Language/Platform: R

Arsenal

An Arsenal of ‘R’ Functions for Large-Scale Statistical Summaries An Arsenal of ‘R’ functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in ‘R’ and ‘RStudio’ and which use formulas and versatile summary statistics for summary tables and models. The primary functions include tableby(), a Table-1-like summary of multiple […]

Attribrisk

Population Attributable Risk Estimates population (etiological) attributable risk for unmatched, pair-matched or set-matched case-control designs and returns a list containing the estimated attributable risk, estimates of coefficients, and their standard errors, from the (conditional, If necessary) logistic regression used for estimating the relative risk. Authors: Beth Atkinson (primary contact) Louis Schenck Cindy Crowson Terry Therneau […]

bdsmatrix

Routines for Block Diagonal Symmetric matrices This is a special case of sparse matrices, used by coxme Authors: Terry Therneau Available at: https://cran.r-project.org/web/packages/bdsmatrix/index.html Language/Platform: R

bilinear.fit

bilinear regression to 16O/18O isotope label experiments Authors: Doug Mahoney (primary contact) Jeanette Eckel-Passow Available at: bilinear.fit.tar.gz Language/Platform: R

BIMA

A mapping/alignment customized for mate-pair library next generation sequencing.

BioR Toolkit – Old Versions

BioR Toolkit – Old Versions Warning! These versions contain a critical tabix-related bug that cause a small percentage of regions to “miss” when using bior_overlap and bior_same_variant against some catalogs. Please use one of the fixed versions HERE. These old versions are maintained here for archive and re-creation purposes only, and should NOT be used […]

BioR: Rapid, Flexible System for Genomic Annotation

A toolkit and set of catalogs to retrieve genomic annotation for variants, genes, diseases, conditions, genetic tests, and drugs.

bnmlci

Exact confidence intervals for a proportion.

boot

Select bootstrap samples.

CAP-miRSEQ

miRNAs play a key role in normal physiology and various diseases such as cancer. However, analyzing miRNA sequencing data is challenging due to the requirement of significant computational resources and bioinformatics expertise. To address this, we present a comprehensive analysis pipeline for deep microRNA sequencing (CAP-miRSeq) that integrates read preprocessing, alignment, mature/precursor/novel miRNA qualification, variant detection in miRNA coding region, and flexible differential expression between experimental conditions. Using well characterized data, we demonstrated the pipeline’s superior performances, flexibilities, and practical use in research and biomarker discovery.

ChIP-RNA-seqPRO

ChIP-RNA-seqPRO is a resource motivated by this current need and provides a strategy that enables the user to profile regulatory associations between epigenomic modifications and co/post-transcriptional processes.

Circ-Seq

Circ-Seq: A comprehensive bioinformatics workflow for detecting circular RNAs Circular RNAs (circRNAs) are recently discovered members of the noncoding RNA family that range in length from a few hundred to thousands of nucleotides. In contrast to linear RNA transcripts, which are normally spliced tail-to-head, circRNAs are formed by the covalent bonding of their 3´ and […]

CNVNator

Structural Variations (SVs) and Copy Number Variations (CNVs) are the major source of genomic variations. CNVnator is a tool for Copy Number Variation (CNV) discovery and genotyping  from depth-of-coverage by mapped reads.  It accepts .bam files as input and generates CNVs calls in less than 10 hours of calculations. The source code and extended descriptions […]

comprisk

Cumulative incidence in the presence of competing risks.

coxme

Mixed Effects Cox Models Cox proportional hazards models containing Gaussian random effects, also known as frailty models. Authors: Terry Therneau Available at: https://cran.r-project.org/web/packages/coxme/index.html Language/Platform: R

criskcox

Competing risk survival analysis with covariates.

deming

Deming, Thiel-Sen and Passing-Bablock Regression Generalized Deming regression, Theil-Sen regression and Passing-Bablock regression functions. Authors: Terry Therneau Available at: https://cran.r-project.org/web/packages/deming/index.html Language/Platform: R

dist

Estimates the distance matrix between two groups (e.g. cases and potential controls) on the basis of a set of X’s.

eSNV-Detect

eSNV-Detect v1.0: Reliable Identification of Variants Using RNA-seq Data

Exogene

A workflow for detecting viral integrations and viral presence from sequencing data.

Ezimputer

Ezimputer is an impute2-based genotype imputation workflow that greatly simplifies the process of imputation and achieves a significant speedup of imputation using multiple CPUs on a computer cluster.

fastlo

Fast Loess Authors: Doug Mahoney Jeanette Eckel-Passow Ann Oberb Link: fastlo_1.3.tar.gz Language/Platform: R

findcut

Uses the method of Contal and O’Quigley (1999) to find the best cutpoint in a continuous variable with regards to a survival outcome.

Fusion-sense

A tool used to calculate the estimated sensitivity of fusion finding for an RNA-seq experiment. It plots the estimated sensitivity as a function of the distance to the 3’ end and also calculates the decay rate for the sample.

GeneSetScan

GeneSetScan is a pre-compiled binary for 64-bit linux systems. It offers a general approach to scan genome-wide SNP data for gene-set association analyses.

GenomeSmasher

GenomeSmasher is a set of tools used to create diploid FASTA files with containing snps, indels, duplications, deletions and translocations.

gmatch

Computerized matching of cases to controls using the greedy matching algorithm

Guide

Required Elements Name of project/tool Short description of project/tool (1-3 sentences) Authors, primary contact Suggested Tags Link to source code & data (tarred/gzipped if large) Other elements you may want to include Date last updated (?) Longer Description User manual Links to Publications for the software System requirements Licensing information If you want to deploy an […]

haplo.stats

Statistical Analysis of Haplotypes with Traits and Covariates when Linkage Phase is Ambiguous This software offers a suite of R routines for the analysis of indirectly measured haplotypes. The statistical methods assume that all subjects are unrelated and that haplotypes are ambiguous (because of unknown linkage phase of the genetic markers). The genetic markers are […]

HGT-ID

HGT-ID v1.0: An efficient and sensitive program for detecting viral insertion sequences in the genome of human cancers

HiChIP Pipeline

HiChIP: A high-throughput pipeline for integrative analysis of ChIP-Seq data HiChIP pipeline is designed for performing comprehensive analysis of chromatin immunoprecipitation and sequencing (ChIP-Seq) data. It can be used to analyze profiles from transcription factor binding, histone modifications, histone variants, and chromatin regulators. Paired-end and single-end NexGen sequencing data from ChIP experiment with different antibodies, […]

hwe

Hardy-Weinberg Equilibrium Tests Test the fit of genotype frequencies to Hardy-Weinberg Equilibrium proportions for autosomes and the X chromosome. Different statistical tests are provided, as well as an option to evaluate statistical significance by either exact methods or simulations Authors: Jason Sinnwell (primary contact) Dan Schaid Dan Folie Link: hwe_0.3.1.tar.gz   Language/Platform: R

Hybrid-Denovo

Microbiota pipeline that utilizes and integrates information from a mix of both paired-end and single-end reads.

ibdreg

Regression Methods for IBD Linkage With Covariates A method to test genetic linkage with covariates by regression methods with response IBD sharing for relative pairs. Account for correlations of IBD statistics and covariates for relative pairs within the same pedigree. Authors: Jason Sinnwell (primary contact) Dan Schaid Available at: https://cran.r-project.org/web/packages/ibdreg/index.html Language/Platform: R

ICQ-lincRNA

ICQ-lincRNA (Identification, Characterization, and Quantification of Long Intergenic Non-Coding RNAs), offers an end-to-end solution to identify and annotate expressed lincRNAs in next generation RNA sequencing data.  Specifically, ICQ-lincRNA: Conducts ab-initio genome-wide transcript assembly by both Cufflinks and Scripture using Binary Alignment/Map (BAM) files Conducts downstream quantitative analyses including gene count, exon count, overlap with known […]

jitplot

Produce gplot of continuous variable(y-axis) vs a group variable(x-axis) in such a way that no points are hidden.

kinship2

Pedigree Functions {Pedigree Functions description} Authors: Jason Sinnwell (primary contact) Terry Therneau Beth Atkinson Dan Schaid Available at: https://cran.r-project.org/web/packages/kinship2/index.html Language/Platform: R

ld.pairs

LD calculations on multi-allele, and SNP variants, including the composite-LD measure.

logisuni

Univariate logistic regression model summaries with multiple dependent variables and predictors.

MAP-RSeq

The MAP-RSeq workflow integrates a suite of open source bioinformatics tools along with in-house developed methods to analyze paired-end RNA-Seq data.

mccc

Computes Lin’s concordance correlation coefficient (CCC) for any number of raters.

mend.err

Check Pedigrees for Mendelian Errors Check Pedigrees for Mendelian Errors and, when errors are found, systematically jackknifes every typed pedigree member to determine if eliminating this member will remove all Mendelian Errors from the pedigree Authors: Jason Sinnwell (primary contact) Dan Schaid Dan Folie Link: mend.err_1.3.tar.gz Language/Platform: R

multic

Quantitative Linkage Analysis Tools using the Variance Components Approach Calculate the polygenic and major gene models for quantitative trait linkage analysis using the variance components approach. Authors: Pat Votruba (primary contact) Beth Atkinson Mariza de Andrade Available at: https://cran.r-project.org/web/packages/multic/index.html Language/Platform: R

nesttest

Conducts likelihood ratio tests for nested logistic and Cox proportional hazards models.

newsurv

Uses Graph Template Language to create a highly customizabile Kaplan-Meier curve.

nobs

This macro creates a macro variable containing the number of observations in a SAS dataset.

outsumm

Creates a single RTF file containing multiple tables created by %SUMMARY.

PANDA

The PANDA (Pathway AND Annotation) Explorer is a data visualization tool capable of annotating genes with any data type and graphically displaying the result within the context of pathways.

Panoply

PANOPLY, a novel computational approach to integrate both germline and somatic data obtained from multi-omics platforms for an individual of interest and analyze that data in the context matched-control samples.

PatternCNV

A versatile tool for detecting copy number changes from exome sequencing data.

pedgene

Gene-Level Statistics for Pedigree Data Gene-level association tests with disease status for pedigree data: kernel and burden association statistics. Authors: Jason Sinnwell (primary contact) Dan Schaid Available at: https://cran.r-project.org/web/packages/pedgene/index.html Language/Platform: R

pleio

Pleiotropy Test for Multiple Traits on a Genetic Marker Perform tests for pleiotropy of multiple traits of various variable types on genotypes for a genetic marker. Authors: Jason Sinnwell (primary contact) Dan Schaid Available at: https://cran.r-project.org/web/packages/pleio/index.html Language/Platform: R

plotcorr

Proc Plot with correlation/regression statistics appended.

plotmat

Create a scatterplot matrix graphically displaying the bivariate relationships between a number of variables.

Protein Panoramic annoTation Tool (P2T2)

P2T2 is a web-based platform for the annotation of proteins using population variants; experimentally determined functional and phenotype-associated variants; literature-mined variant-phenotype relationships; and structural bioinformatics features such as linear motifs, domains and experimental structures.

RNASeqPower

Sample size for RNAseq studies {Short description of project/tool (1-3 sentences)} Authors: Terry Therneau (primary contact) Steven Hart JP Kocher Available at: https://bioconductor.org/packages/release/bioc/html/RNASeqPower.html Publication: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3842884/ Language/Platform: R  

rocplus

Estimated Integrated Discrimination Index (IDI) and Net Reclassification Improvement (NRI) for comparison of a new risk model to an old model.

rpart

Recursive Partitioning and Regression Trees Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone. Authors: Beth Atkinson Terry Therneau Brian Ripley (primary contact) Available at: https://cran.r-project.org/web/packages/rpart/index.html Language/Platform: R

RVboost

RVboost v0.1: RNA-seq variant prioritization approach for Illumina next-generation sequencing data.

SAAP-RRBS

The Streamlined Analysis and Annotation Pipeline for RRBS data (SAAP-RRBS) integrates read quality assessment/clean-up, alignment, methylation data extraction, annotation, reporting, and visualization. With this package, bioinformaticians or investigators can submit sequencing reads and quickly receive a fully annotated CpG methylation report.

schoen

Schoenfeld residuals for proportional hazards model.

SMART

Sequential Multiple Assignment Randomized Trial (SMART) design includes multiple stages of randomization, where participants are randomized to an initial treatment in the first stage and then subsequently re-randomized between treatments in the following stage. Includes methods for mean and variance as a function of specificity/sensitivity, and power calculations.

SnowShoes-FTD

SnowShoes-FTD is a bioinformatics tool to identify fusion transcripts from paired-end transcriptome sequencing data.

SNPPicker

A post-processor to optimize the selection of tag SNPs from common bin-tagging programs. SNPPicker uses a multi-step search strategy in combination with a statistical model to produce optimal genotyping panels.

SoftSearch

SoftSearch is a sensitive structural variant (SV) detection tool for Illumina paired-end next-generation sequencing data.

SpatialNorm6

SpatialNorm6 Spatial normalization of Affymetrix SNP 6.0 cel file, which adjusts for spatial bias based on wavelet decomposition Authors: Chai High Seng (primary contact) Link: SpatialNorm6_1.1.tar.gz  

Stress.dfArray

Stress.dfArray Calculates normalization Stress and dfArray quality for a set of arrays. Authors: Doug Mahoney (primary contact) Jeanette Eckel-Passow Available at: Stress.dfArray_1.1.tar.gz Language/Platform: R

summary

Creates a table of variable summaries plus test statistics for the difference between two or more independent samples.

SuperLearner

Super Learner Prediction Implements the super learner prediction method and contains a library of prediction algorithms to be used in the super learner. Authors: Eric Polley (primary contact) LeDell van der Laan Kennedy Lendle Available at: https://cran.r-project.org/web/packages/SuperLearner/index.html https://github.com/ecpolley/SuperLearner Language/Platform: R

surv

Complete Kaplan-Meier survival analysis with printing options and logrank statistic.

survcstd

Calculates the c-statistic (concordance, discrimination index) for survived data with time dependent covariates

survival

Survival Analysis Contains the core survival analysis routines, including definition of Surv objects, Kaplan-Meier and Aalen-Johansen (multi-state) curves, Cox models, and parametric accelerated failure time models. Authors: Terry Therneau (primary contact) Available at: https://cran.r-project.org/web/packages/survival/index.html https://github.com/therneau/survival Language/Platform: R  

survlrk

Calculates logrank statatistics for the surv macro.

survlt

General survival statistics p(t), standard error, confidence limits, and median survival time, for the left-truncated survival analyses.

survplot

Creates high-quality and easily customized Kaplan-Meier plots.

symmchk

Checks for symmetry and suggests the best power transformation, if one exists, to make an asymmetric distribution symmetric

ToxT

ToxT is a combination of methods to analyze TOXicity data over Time. It can be used for adverse event data and for patient reported outcomes. It includes longitudinal mixed models, bug plots, time-to-event analyses, heatmaps, area under the curve analyses, profile analysis, comparisons at each time point, and plots over time.

Trace-RRBS

trace-rrbs v0.1: Targeted Alignment and Artificial Cytosine Elimination for RRBS for Illumina next-generation sequencing data.

TREAT

TREAT is a Targeted RE-sequencing Annotation Tool that offers a comprehensive, open framework, end-to-end solution for analyzing and interpreting targeted re-sequencing data.

trex

trex Package that calculates a truncated exact test for two-stage case-control studies for rare genetic variants. The first stage is for screening rare variants in only cases. If the number of case-carriers of any rare variants exceeds a user-specified threshold, then additional cases and controls are genotyped for the detected variants and carrier status of […]

uagreemt

Measures agreement, precision, accuracy, total deviation index and coverage probability.

UCLncR Pipeline

The Ultrafast and Comprehensive lncRNA detection (UClncRNA) pipeline leverages fast transcript assembly and parallel computing tools, multi-step filters for increased specificity to provide comprehensive lncRNA characterization.

VCF-Miner

The Variant Call Format (VCF) is the de facto standard for storing variant information from next-generation DNA sequencing experiments.

vmatch

Computerized matching of cases to controls using variable optimal matching.

WANDY

Wandy: A program for CNV/Aneuploidy detection from WGS sequencing data   Introduction Wandy is designed for Copy Number Variation (CNV) and Aneuploidy detection from large genomes such as human. It takes a sorted BAM file as input and report predicted chromosome regions that have amplifications or deletions using LOG2 ratio, generate graphic reports. There are […]