GenomeSmasher is a set of tools used to create diploid FASTA files with containing snps, indels, duplications, deletions and translocations. These FASTA files can then be used in conjunction with next-generation sequencing simulators to artificially create sequencing experiments. The utility of these tools are to assess the performance and reliability of data analysis in next-generation sequencing pipelines.
We also provide 2 scripts to help prepare input files for GenomeSmasher
Input parameters to the script
- ‘refgenome_dir’ => Path to the genome directory (Each chromosome must have separate fasta file Ex: Chr1.fa).
- ‘input_vcf_like’ => Structural Variants to be implemented must be described in the VCF like file(Fist 8 columns mandatory).
- ‘process_dir’ => Path to the directory where temporary files and results will be created.(Temporary files are deleted in the directory).
- ‘ins_files_dir’ => Path to the directory with all input insertion sequences.(Must be in fasta format).
USAGE: perl perl_genome_simulator.pl -refgenome_dir <REF CHR DIREC> -input_vcf_like <INPUT vcf likefile> -ins_files_dir <INSERT FILES DIR> -process_dir <TEMPDIR>
- New pairs of Chromosome files are generated with structural variants mentioned in the VCF file.
- A new VCF file is generated describing about all the structural variants implemented.
- Log file is created.
- No two structural variants should exist on the same line (A single line consists of 50 bases) except for snps and translocations.
- No event will be added if the reference genome base = “N”.
For help, see our FAQ page.
Authors: Steven N. Hart, Naresh Prodduturi
Steven N. Hart, Ph.D.
Page last modified: October 24, 2014