Hybrid-denovo is a de novo OTU-picking pipeline integrating single- and paired-end 16S sequence tags. It is designed to take Illumina paired-end sequencing reads as input and output the OTU BIOM table, together with their representative sequences and a phylogenetic tree of OTUs.
The most distinguishable feature of hybrid-denovo is that it can process a mixture of paired-end reads and single-end reads. It is very useful in that Illumina paired-end reads become a mixture of paired-end reads and single-end reads after quality control. For more details, please read our online article.
Linux platform (we used CentOS 6)
From source code:
- Download hybrid-denovo.tar.gz and unpack.
- Download Linux version 8 of USEARCH from http://www.drive5.com/usearch/download.html
- Open config/tool.info and set up paths to USEARCH, java, python(ver 2.7) and QIIME (or QIIME2)
- This package also includes some python libraries: biom-format (ver 1.3.1), bitarray (ver 0.8.1), pyqi (ver 0.2.0), numpy (ver 1.8.1) and biopython (ver 1.66). Hopefully, they work under your environment. If you get any error message about a library missing, please install it by yourself and set the path in tool.info.
From VM using virtualBox
- Alternatively, you can download our VM virtualBox hybriddenovo.ova, which packages all dependencies
- Install on Windows (we installed it on windows 7)
- Install Oracle VirtualBox
- Open the OVA image you downloaded in step 1.
- Ubuntu is installed in the VM virtualBox and the sudo password is ‘mayo’ (in cause you want to install additional packages)
Files and directories in the package (hybrid-denovo.tar.gz):
- hybrid-denovo: the main script file
- config: directory that stores configuration files
- run.info : the input parameters of the pipeline (open the config/run.info for detail)
- tool.info : the path to external modules and packages of the pipeline, and it is set in run.info
- external : external modules and packages
- README : this README file
- sampleV3V5: a test sample for V3V5 rDNA amplicon reads
- scripts : shell script and jar files developed by us
- test : our test run results
key parameters to set run.info (open the config/run.info to edit):
- R1PAIRED_READ_TYPE: read type (0: single end; 1: paired end with overlap, such as V4 region amplicon; 2: paired end without overlap, such as V3-V5 region amplicon)
- R1PAIRED_READ_LENGTH: input read length
- R1PAIRED_INPUT_FILES: a directory that includes all input fastq files. (within which, any *.fastq will be used as input)
- R1PAIRED_WORK_DIR: your working/output directory
- R1PAIRED_TOOL_INFO: absolute path to tool.info (by default, the pipeline will use: /your_source_dir/config/tool.info). Please remember to open tool.info and set correct tool paths
- mapping.txt: a mapping file associates sample ID and fastq file, based on which,
you can add other meta information for further analysis (such as QIIME).
- workspace/imtornado/QC.log.txt: QC results showing the number of input reads and
the number of QC passed reads
- workspace/imtornado/: results generated by IM-TORNADO using read1s only
- test_R1.biom (BIOM file)
- test_R1.biom.table (converted by QIIME from BIOM file)
- test_R1.tree (a phylogenetic tree generated by FastTree)
- test_R1.otus.final.result.fasta (OTU representatives)
- workspace/imtornado/: results generated by IM-TORNADO using paired-end reads
- test_paired.biom (BIOM file)
- test_paired.biom.table (converted by QIIME from BIOM file)
- test_paired.tree (a phylogenetic tree generated by FastTree)
- test_paired.otus.final.result.fasta (OTU representatives)
- workspace/R1Paired/: results generated by our hybrid-denovo method
- test_PairedSingle.biom (BIOM file)
- test_PairedSingle.biom.table (converted by QIIME from BIOM file)
- test_PairedSingle.tree (a phylogenetic tree generated by FastTree)
- test_PairedSingle.otus.final.result.fasta (OTU representatives)
- Go to unpacked directory
- mkdir mytest
- cd mytest
- Run command ‘../hybrid-denovo ../config/run.info’.
- Compare your results to our results (in /your_source_dir/test) to confirm if you have installed correctly.
- Installing python libraries individually may cause lots of dependence issues. We suggest you to install QIIME first as many libraries used in the pipeline will be auto-installed with QIIME.
- biom-format (ver 1.3.1) in tool.info must be kept because this version is required. If you have installed QIIME2, biom-format (ver 2) may be auto-installed and set in your default path, which may cause a path problem.
- If you install on Ubuntu, all python libraries included in this package have to be re-installed, and C complied executable files have to be re-built from source code.
- To deal with large datasets, we have a parallel-computing version. For request, please contact us.
Page last modified: March 7, 2017