
We list the data files provided in GeneSetScan with a description about each.

All files are created from free public sources current as of March, 2011, and were 
processed by scripts written by authors of GeneSetScan. The only file that is no longer 
distributed is the gene lists from KEGG.

filename [unzipped size]
------------------------------
gene_snp.B37.coding.dat [80MB]  
edges.nspace.csv [2.8MB] 
gene2go.human  [13MB]
hsa_pathway.list [0.4MB]


gene_snp.B37.coding.dat  
----------------------------
SNPs that are part of HapMap release 23a mapped to genes by the NCBI human genome build 37
positions.  To create this file, we used a map file for HapMap available from Plink here:
  http://pngu.mgh.harvard.edu/~purcell/plink/res.shtml#hapmap,
which we updated the SNP positions to NCBI human genome build 37.1 and mapped them to
genes if they were within +/-50kb of the start/stop of any protein-coding gene.
The gene start/stop positions are provided by seq_gene.md from this site:

  ftp://ftp.ncbi.nih.gov/genomes/MapView/Homo_sapiens/sequence/BUILD.37.2/initial_release/seq_gene.md.gz,

and their protein-coding status is provided in Homo_sapiens.gene_info, from this site:

  ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homoe_sapiens.gene_info



edges.nspace.csv  
--------------------------
Using this file from GO: 
http://www.geneontology.org/ontology/obo_format_1_2/gene_ontology_ext.obo, 
we created a file listing edges for the 3 GO namespaces. We restricted the
edges to these relationships: is_a, part_of, regulates, and negatively_regulates. 


gene2go.human  
--------------------------
Downloaded from this site: ftp://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz
and subset to just genes for the homo-sapiens species that map to GO terms.

hsa_pathway.list
--------------------------
Downloaded from this site:  ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/hsa_pathway.list
This file is no longer freely available, as of June 30, 2011.

