GeneSetScan is a pre-compiled binary for 64-bit linux systems. It offers a general approach to scan genome-wide SNP data for gene-set association analyses, described in Schaid et al. (

The test statistic for a gene set is based on score statistics for generalized linear models, and takes advantage of the directed acyclic graph structure of the Gene Ontology to create gene-sets. The method can use other gene-set structures, such as the Kyoto Encyclopedia of Genes and Genomes (KEGG), or even user-defined sets.

Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for linkage disequilibrium and correlations among genes and gene sets, and provides accurate step-down adjusted p-values for each gene-set.

Input files:


  • snpscores.txt: user-created SNP-scores, the residual from testing the SNP with the trait, with or without adjusting for covariates. Shell and R scripts are provided to create these files from plink formatted data, running a separate process for each chromosome, and then concatenating the scores into one large file

Provided in package:

  • gene_snp.B37.coding.dat: reference file mapping dbSNP132 SNPs to GRCh Build 37 positions and the start/stop positions of all coding genes, by their rsid.
  • edges.csv: file defining the GO directed acyclic graph edges (release date: March 2011)
  • gene2go.human: list of genes mapping to GO terms (release date: March 2011)
  • hsa_pathway.list: list of genes mapping to KEGG human pathways (release date: March 2011)


System Requirements:

Memory and disk space usage are two issues to consider before running this program. First, we define ”memory” as system memory needed while the program is running from start to finish, and we define ”disk space” as the hard-disk space needed to store any file on your system.

GeneSetScan should only be run on 64-bit machines with sufficient memory available during run-time. The following table contains a summary of the approximate disk and memory usage on two different sample sizes with 550K SNP-Chips.

Nsubj NSNP Disk Memory
1000 550K 8GB 4GB
2500 550K 13GB 14GB

See Also:

Dan Schaid’s software page

Page last modified: October 24, 2014