Biomedical Statistics and Informatics Software Packages


Go to...
Software Packages
Division Overview

Hybrid-Denovo

Hybrid-denovo is a de novo OTU-picking pipeline integrating single- and paired-end 16S sequence tags. It is designed to take Illumina paired-end sequencing reads as input and output the OTU BIOM table, together with their representative sequences and a phylogenetic tree of OTUs.

The most distinguishable feature of hybrid-denovo is that it can process a mixture of paired-end reads and single-end reads. It is very useful in that Illumina paired-end reads become a mixture of paired-end reads and single-end reads after quality control. For more details, please read our online article.

System requirements:
Linux platform (we used CentOS 6)

Installation:

From source code:

  1. Install Miniconda.
  2. Install hybrid-denovo (conda install hybrid-denovo -c jeffchen2000 -c conda-forge -c bioconda -c biobuilds)
  3. Download hybrid-denovo reference database (database_sliva132.97.tar.gz, database_greengenes13_5.97.tar.gz) and uncompress.
  4. Download Linux version 8 of USEARCH from http://www.drive5.com/usearch/download.html
  5. Open config/tool.info and set up paths to USEARCH and hybrid-denovo reference databases.
  6. Make sure the files are installed in /your_miniconda_path/share/

From VM using virtualBox

  1. Alternatively, you can download our VM virtualBox hybriddenovo.ova, which packages all dependencies
  2. Install on Windows (we installed it on windows 7)
  3. Install Oracle VirtualBox
  4. Open the OVA image you downloaded in step 1.
  5. Ubuntu is installed in the VM virtualBox and the sudo password is ‘mayo’ (in cause you want to install additional packages)

Files and directories in the package (hybrid-denovo.tar.gz):

  1. hybrid-denovo: the main script file
  2. config: directory that stores configuration files
    1. run.info         : the input parameters of the pipeline (open the config/run.info for detail)
    2. tool.info        : the path to external modules and packages of the pipeline, and it is set in run.info
  3. external  : external modules and packages
  4. README    : this README file
  5. sampleV3V5: a test sample for V3V5 rDNA amplicon reads
  6. scripts   : shell script and jar files developed by us
  7. test      : our test run results

Usage:
/path/to/hybrid-denovo /path/to/run.info

key parameters to set run.info (open the config/run.info to edit):

  • R1PAIRED_READ_TYPE: read type (0: single end; 1: paired end with overlap, such as V4 region amplicon; 2: paired end without overlap, such as V3-V5 region amplicon)
  • R1PAIRED_READ_LENGTH: input read length
  • R1PAIRED_INPUT_FILES: a directory that includes all input fastq files. (within which, any *.fastq will be used as input)
  • R1PAIRED_WORK_DIR: your working/output directory
  • R1PAIRED_TOOL_INFO: absolute path to tool.info (by default, the pipeline will use: /your_source_dir/config/tool.info).  Please remember to open tool.info and set correct tool paths

Output Files:

  • mapping.txt: a mapping file associates sample ID and fastq file, based on which,
    you can add other meta information for further analysis (such as QIIME).
  • workspace/imtornado/QC.log.txt: QC results showing the number of input reads and
    the number of QC passed reads
  • workspace/imtornado/: results generated by IM-TORNADO using read1s only
    • test_R1.biom (BIOM file)
    • test_R1.biom.table (converted by QIIME from BIOM file)
    • test_R1.tree (a phylogenetic tree generated by FastTree)
    • test_R1.otus.final.result.fasta (OTU representatives)
  • workspace/imtornado/: results generated by IM-TORNADO using paired-end reads
    • test_paired.biom (BIOM file)
    • test_paired.biom.table (converted by QIIME from BIOM file)
    • test_paired.tree (a phylogenetic tree generated by FastTree)
    • test_paired.otus.final.result.fasta (OTU representatives)
  • workspace/R1Paired/: results generated by our hybrid-denovo method
    • test_PairedSingle.biom (BIOM file)
    • test_PairedSingle.biom.table (converted by QIIME from BIOM file)
    • test_PairedSingle.tree (a phylogenetic tree generated by FastTree)
    • test_PairedSingle.otus.final.result.fasta (OTU representatives)

Test run:

  1. Go to unpacked directory
  2. mkdir mytest
  3. cd mytest
  4. Run command ‘../hybrid-denovo ../config/run.info’.
  5. Compare your results to our results (in /your_source_dir/test) to confirm if you have installed correctly.

Notes:

  1. To deal with large datasets, we have a parallel-computing version. For request, please contact us.

Prior versions:

  • 1.0.0 – Original release

Questions:
Please contact chen.xianfeng@mayo.edu or chen.jun2@mayo.edu

Page last modified: August 30, 2018