1. Installing fusion-sense ============================================ Prerrequisites: A Unix system, java >= 1.6.0.5, python > = 2.7.10, R >=3.1.1 with ggplot2 >=1.0.1 and plyr>=1.8.3 Step 0) Download the files: bioinformaticstools.mayo.edu/research/fusion-sense/fusion-sense.src.tar.gz bioinformaticstools.mayo.edu/research/fusion-sense/fusion-sense.ref-test.tar.gz into a single directory and decompress them using the unix command line: $ tar xvfz fusion-sense.src.tar.gz $ tar xvfz fusion-sense.ref-test.tar.gz This will create a fusion-sense directory that should look like: config fusion-sense.py out README.txt ref test Step 1) Include the path to where you downloaded the fusion-sense code in your PATH variable, i.e. from the unix command line type: $ export PATH=$PATH:~/downloads/fusion-sense Step 2) Make sure your enviroment variable R_LIBS points to your R package library and you have installed ggplot2. To install ggplot2, from the R command line type: >install.packages("ggplot2") To know the place where such package is installed, you can run from the R command line: >.libPaths() [1]"/Library/Frameworks/R.framework/Resources/library" To modify your R_LIBS environment, from the unix command line type: $ export R_LIBS="/Library/Frameworks/R.framework/Resources/library" Step 3) Test your installation by running the following test data set from the unix command line in the directory where you install the software: $ cd fusion-sense $ mkdir out $ python fusion-sense.py --bam test/test.bam --outputdir out At the end of such command you should be able to find the following png files in the directory out: out/test.median.decay.png : Graph with the median decay rate calculation out/test.sensitivity.cdf.png: Cumulative sensitivy distribution graph out/test.sensitivity.png : Fusion sensitivity graph In case such graphs are not generated try from the unix command line: $ Rscript --vanilla out/test.plots.R 2. Options available ===================================================== --bam :The path for the input bam file (Mandatory) --outputdir :The path for generation of graphs and intermediate files. If not specified it defaults to the current directory. --coverage :Minimum coverage threshold for assesing whether we can detect a fusion in a particular place. Default is 10. --gene_list: Configuration file with the gene coordinates of the genes of interest. By default we include an example in config/clinical_genes_coords.bed. Notice the format is chromosome, start, end, genename - (distance to 3'UTR), i.e. chr14 53324289 53324290 NM_006832-300 where chromosome=chr14, start=53324289, end=start+1, genename=NM_006832 and distance to 3' UTR = 300. --uhr_dir : Directory where the calibration files for the UHR files are available. By default is in config --gatk_path : Path where GATK is available. By default is in config. Notice we include in the software an old GATK version (1.2-26) --reference : Path where the human genome reference is GATK is available. By default is in ref/allchr.fa. --java : Complete path to java. By default is java --R : Complete path to Rscript. By default is Rscript 3. Output files =============== For ease of explanation, we will assume the input file name is test.bam The files: testcvg_length.sample_interval_statistics testcvg_length.sample_cumulative_coverage_proportions testcvg_length.sample_interval_summary testcvg_length.sample_summary testcvg_length testcvg_length.sample_statistics testcvg_length.sample_cumulative_coverage_counts are generated by GATK DepthOfCoverage and can be deleted once the scripts finishes. The following is a description of the files generated by the script (in order of importance) test.sensitivity.png : Sensitivity graph plotted as the proportion of genes with coverage >=10 at a particular distance from the 3' end. Curves for the UHR sample at different degradation levels are overlayed test.sensitivity.cdf.png : Cumulative sensitivity graph. Curves for UHR are overlayed test.median.decay.png : Graph measuring the median coverage for the genes of interest as a function of distance from 3' end. The decay rate and the fit of the model are also depicted test_low_expressed_300 : List of genes that did not meet the expression threshold -- Over 10x at a distance of 300bp from the 3' end --- test.plots.R : R code that generates the plots test.avg.UHR.choppy.graph.txt : Intermediate file use for generating the sensitivity graph. test.avg.UHR.cumulative.txt : Intermediate file use for generating the cumulative sensitivity graph. test.cvg_length.expr.txt : Intermediate file needed for the median decay graph. test.cvg_length.table.txt : Intermediate file needed for the median decay graph. 4. License ========== Copyright (c) 2016, Numrah Fadra(fadra.numrah@mayo.edu), Jaime Davila (davila.jaime@mayo.edu) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.