1. Installing fusion-sense
============================================

Prerrequisites: A Unix system, java >= 1.6.0.5, python > = 2.7.10, R
>=3.1.1 with ggplot2 >=1.0.1 and plyr>=1.8.3

Step 0) Download the files:

bioinformaticstools.mayo.edu/research/fusion-sense/fusion-sense.src.tar.gz

bioinformaticstools.mayo.edu/research/fusion-sense/fusion-sense.ref-test.tar.gz

into a single directory and decompress them using the unix command line:

$ tar xvfz fusion-sense.src.tar.gz
$ tar xvfz fusion-sense.ref-test.tar.gz

This will create a fusion-sense directory that should look like:

config  fusion-sense.py  out  README.txt  ref  test

Step 1) Include the path to where you downloaded the fusion-sense code
in your PATH variable, i.e. from the unix command line type:

$ export PATH=$PATH:~/downloads/fusion-sense

Step 2) Make sure your enviroment variable R_LIBS points to your R
package library and you have installed ggplot2. To install ggplot2,
from the R command line type:

>install.packages("ggplot2")

To know the place where such package is installed, you can run from
the R command line: 
>.libPaths()
[1]"/Library/Frameworks/R.framework/Resources/library"

To modify your R_LIBS environment, from the unix command line type:

$ export R_LIBS="/Library/Frameworks/R.framework/Resources/library"

Step 3) Test your installation by running the following test data set
from the unix command line in the directory where you install the
software:

$ cd fusion-sense
$ mkdir out 
$ python fusion-sense.py --bam test/test.bam --outputdir out

At the end of such command you should be able to find the following
png files in the directory out:

out/test.median.decay.png   : Graph with the median decay rate calculation
out/test.sensitivity.cdf.png: Cumulative sensitivy distribution graph
out/test.sensitivity.png    : Fusion sensitivity graph

In case such graphs are not generated try from the unix command line:

$ Rscript --vanilla out/test.plots.R

2. Options available 
=====================================================

--bam       :The path for the input bam file (Mandatory)

--outputdir :The path for generation of graphs and intermediate
             files. If not specified it defaults to the current directory.

--coverage :Minimum coverage threshold for assesing whether we can
	    detect a fusion in a particular place. Default is 10.

--gene_list: Configuration file with the gene coordinates of the genes
             of interest. By default we include an example in
             config/clinical_genes_coords.bed. Notice the format is
             chromosome, start, end, genename - (distance to 3'UTR),
             i.e.

             chr14	53324289	53324290	NM_006832-300

             where chromosome=chr14, start=53324289, end=start+1,
             genename=NM_006832 and distance to 3' UTR = 300.

--uhr_dir : Directory where the calibration files for the UHR files
            are available.  By default is in config

--gatk_path : Path where GATK is available. By default is in
              config. Notice we include in the software an old GATK
              version (1.2-26)

--reference : Path where the human genome reference is GATK is
              available. By default is in ref/allchr.fa.

--java       : Complete path to java. By default is java
--R          : Complete path to Rscript. By default is Rscript

3. Output files
===============

For ease of explanation, we will assume the input file name is
test.bam

The files:

testcvg_length.sample_interval_statistics
testcvg_length.sample_cumulative_coverage_proportions
testcvg_length.sample_interval_summary
testcvg_length.sample_summary
testcvg_length
testcvg_length.sample_statistics
testcvg_length.sample_cumulative_coverage_counts

are generated by GATK DepthOfCoverage and can be deleted once the
scripts finishes.

The following is a description of the files generated by the script
(in order of importance)

test.sensitivity.png : Sensitivity graph plotted as the proportion of
                       genes with coverage >=10 at a particular
                       distance from the 3' end.  Curves for the UHR
                       sample at different degradation levels are
                       overlayed

test.sensitivity.cdf.png : Cumulative sensitivity graph. Curves for
			   UHR are overlayed

test.median.decay.png : Graph measuring the median coverage for the
                        genes of interest as a function of distance
                        from 3' end. The decay rate and the fit of the
                        model are also depicted

test_low_expressed_300 : List of genes that did not meet the
                         expression threshold -- Over 10x at a
                         distance of 300bp from the 3' end ---

test.plots.R : R code that generates the plots

test.avg.UHR.choppy.graph.txt : Intermediate file use for generating
			        the sensitivity graph.

test.avg.UHR.cumulative.txt : Intermediate file use for generating the
                              cumulative sensitivity graph.

test.cvg_length.expr.txt : Intermediate file needed for the median
			   decay graph.

test.cvg_length.table.txt : Intermediate file needed for the median
			    decay graph.

4. License
==========

Copyright (c) 2016, Numrah Fadra(fadra.numrah@mayo.edu), Jaime Davila
(davila.jaime@mayo.edu) All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

1. Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions and the following disclaimer in the
   documentation and/or other materials provided with the
   distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
