Hybrid-Denovo
Hybrid-denovo is a de novo OTU-picking pipeline integrating single- and paired-end 16S sequence tags. It is designed to take Illumina paired-end sequencing reads as input and output the OTU BIOM table, together with their representative sequences and a phylogenetic tree of OTUs.
The most distinguishable feature of hybrid-denovo is that it can process a mixture of paired-end reads and single-end reads. It is very useful in that Illumina paired-end reads become a mixture of paired-end reads and single-end reads after quality control. For more details, please read our online article.
System requirements:
Linux platform (we used CentOS 6)
Installation:
From source code:
- Install miniconda (Python2.7).
- Install hybrid-denovo (conda install hybrid-denovo -c jeffchen2000 -c conda-forge -c bioconda -c biobuilds)
- Download hybrid-denovo reference database (hybrid-denovo_database.tar.gz) and uncompress.
- Download Linux version 8 of USEARCH from http://www.drive5.com/usearch/download.html
- Open config/tool.info and set up paths to USEARCH and hybrid-denovo reference databases.
- Make sure the files are installed in /your_miniconda_path/share/
From VM using virtualBox
- Alternatively, you can download our VM virtualBox hybriddenovo.ova, which packages all dependencies
- Install on Windows (we installed it on windows 7)
- Install Oracle VirtualBox
- Open the OVA image you downloaded in step 1.
- Ubuntu is installed in the VM virtualBox and the sudo password is ‘mayo’ (in cause you want to install additional packages)
Files and directories in the package (hybrid-denovo.tar.gz):
- hybrid-denovo: the main script file
- config: directory that stores configuration files
- run.info : the input parameters of the pipeline (open the config/run.info for detail)
- tool.info : the path to external modules and packages of the pipeline, and it is set in run.info
- external : external modules and packages
- README : this README file
- sampleV3V5: a test sample for V3V5 rDNA amplicon reads
- scripts : shell script and jar files developed by us
- test : our test run results
Usage:
/path/to/hybrid-denovo /path/to/run.info
key parameters to set run.info (open the config/run.info to edit):
- R1PAIRED_READ_TYPE: read type (0: single end; 1: paired end with overlap, such as V4 region amplicon; 2: paired end without overlap, such as V3-V5 region amplicon)
- R1PAIRED_READ_LENGTH: input read length
- R1PAIRED_INPUT_FILES: a directory that includes all input fastq files. (within which, any *.fastq will be used as input)
- R1PAIRED_WORK_DIR: your working/output directory
- R1PAIRED_TOOL_INFO: absolute path to tool.info (by default, the pipeline will use: /your_source_dir/config/tool.info). Please remember to open tool.info and set correct tool paths
Output Files:
- mapping.txt: a mapping file associates sample ID and fastq file, based on which,
you can add other meta information for further analysis (such as QIIME). - workspace/imtornado/QC.log.txt: QC results showing the number of input reads and
the number of QC passed reads - workspace/imtornado/: results generated by IM-TORNADO using read1s only
- test_R1.biom (BIOM file)
- test_R1.biom.table (converted by QIIME from BIOM file)
- test_R1.tree (a phylogenetic tree generated by FastTree)
- test_R1.otus.final.result.fasta (OTU representatives)
- workspace/imtornado/: results generated by IM-TORNADO using paired-end reads
- test_paired.biom (BIOM file)
- test_paired.biom.table (converted by QIIME from BIOM file)
- test_paired.tree (a phylogenetic tree generated by FastTree)
- test_paired.otus.final.result.fasta (OTU representatives)
- workspace/R1Paired/: results generated by our hybrid-denovo method
- test_PairedSingle.biom (BIOM file)
- test_PairedSingle.biom.table (converted by QIIME from BIOM file)
- test_PairedSingle.tree (a phylogenetic tree generated by FastTree)
- test_PairedSingle.otus.final.result.fasta (OTU representatives)
Test run:
- Go to unpacked directory
- mkdir mytest
- cd mytest
- Run command ‘../hybrid-denovo ../config/run.info’.
- Compare your results to our results (in /your_source_dir/test) to confirm if you have installed correctly.
Notes:
- To deal with large datasets, we have a parallel-computing version. For request, please contact us.
Prior versions:
- 1.0.0 – Original release
Questions:
Please contact chen.xianfeng@mayo.edu or chen.jun2@mayo.edu
Page last modified: November 17, 2023