
----------------------------------#### Hybrid-denovo ####---------------------------------------

    Hybrid-denovo is a de novo OTU-picking pipeline integrating single- and paired-end 16S 
sequence tags. It is designed to take Illumina paired-end sequencing reads as input and output the  
OTU BIOM table, together with their representative sequences and a phylogenetic tree of OTUs.

    The most distinguishable feature of hybrid-denovo is that it can process a mixture of
paired-end reads and single-end reads. It is very useful in that Illumina paired-end reads become
a mixture of paired-end reads and single-end reads after quality control. For more details, please
read our online article.


system requirements:
	Linux platform (CentOS 6 preferred)

installation:
	from source code
		1) download hybrid-denovo.tar.gz and unpack.
		2) download Linux version8 USEARCH from http://www.drive5.com/usearch/download.html
		3) open config/tool.info and set up paths to USEARCH, java, python(ver 2.7) and
		   QIIME (or QIIME2)
		4) this package also includes some python libraries: biom-format (ver 1.3.1),
		   bitarray (ver 0.8.1), pyqi (ver 0.2.0), numpy (ver 1.8.1) and biopython (ver 1.66).
		   Hopefully, they work under your environment. If you get any error message about a
		   library missing, please install it by yourself and set the path in tool.info.
	from VM virtualBox
		1) alternatively, you can download our VM virtualBox hybriddenovo.ova, which packages all dependencies
		2) install on Windows (we intalled it on windows 7)
		3) ubuntu is installed in the VM virtualBox and the sudo password is 'mayo' (in cause you want to install additional packages)

files and directories in the package (hybrid-denovo.tar.gz):
	1) hybrid-denovo: the main script file
	2) config: directory that stores configuration files
	   a) run.info         : the input parameters of the pipeline (open the config/run.info for detail)
	   b) tool.info        : the path to external modules packages of the pipeline, and it is set in run.info
	3) external  : external modules and packages
	4) README    : this README file
	5) sampleV3V5: a test sample set for V3V5 rDNA amplicon reads. For each sample, there are 8000 high quality forward reads (R1)
		       and 4000 reverse reads (R2). Thus we mimic a dataset with 50% of R2s are removed after QC.
	6) scripts   : shell script and jar files developed by us
	7) test      : our test run results

usage:
	/path/to/hybrid-denovo /path/to/run.info 

key parameters to set run.info (open the config/run.info to edit):
	R1PAIRED_READ_TYPE: read type (0: single end; 1: paired end with overlap, such as V4 region amplicon; 2: paired end without overlap, such as V3-V5 region amplicon)
	R1PAIRED_READ_LENGTH: input read length
	R1PAIRED_INPUT_FILES: a directory that includes all input fastq files. (within which, any *.fastq will be used as input)
	R1PAIRED_WORK_DIR: your working/output directory
	R1PAIRED_TOOL_INFO: absolute path to tool.info. also please open tool.info and set correct tool paths (by default, the pipeline will use: /your_source_dir/config/tool.info)

outputs:
	mapping.txt: a mapping file associates sample ID and fastq file, based on which,
		     your can add other meta information for further analysis (such as QIIME).

	workspace/imtornado/QC.log.txt: QC results showing the number of input reads and
					the number of QC passed reads

	workspace/imtornado/: results generated by read1s only
		test_R1.biom (BIOM file)
		test_R1.biom.table (converted by QIIME from BIOM file)
		test_R1.tree (a phylogenetic tree generated by FastTree)
		test_R1.otus.final.result.fasta (OTU representatives)

	workspace/imtornado/: results generated by paired-end reads
		test_paired.biom (BIOM file)
		test_paired.biom.table (converted by QIIME from BIOM file)
		test_paired.tree (a phylogenetic tree generated by FastTree)
		test_paired.otus.final.result.fasta (OTU representatives)

	workspace/R1Paired/: results generated by our hybrid-denovo method
		test_PairedSingle.biom (BIOM file)
		test_PairedSingle.biom.table (converted by QIIME from BIOM file)
		test_PairedSingle.tree (a phylogenetic tree generated by FastTree)
		test_PairedSingle.otus.final.result.fasta (OTU representatives)

test run:
	1) go to unpacked directory
	2) mkdir mytest
	3) cd mytest
	4) run command '../hybrid-denovo ../config/run.info'.
	5) compare your results to our results (in /your_source_dir/test) to confirm if you have installed correctly.

notes:
	1) Installing python libraries individually may cause lots of dependence issues. We suggest you to install
	   QIIME first as many libraries used in the pipeline will be auto-installed with QIIME.
	2) biom-format (ver 1.3.1) in tool.info must be kept because this version is required. If you have installed
	   QIIME2, biom-format (ver 2) may be auto-installed and set in your default path, which may cause a path problem.
	3) If you install on ubuntu, all python libraries included in this package have to be re-installed, and C complied
	   excutable files have to be re-built from source code.
	4) To deal with large datasets, we have a paralleled computing version. For request, please contact us.

contacts:
	chen.xianfeng@mayo.edu or chen.jun2@mayo.edu
	
