SoftSearch was developed as a sensitive structural variant (SV) detection tool for Illumina paired-end next-generation sequencing data. SoftSearch simultaneously utilizes soft-clipping and read-pair strategies for detecting SVs to increase sensitivity. Soft clips are proxies for split-reads that indicate part of the read maps to the reference genome, but the other part is not localized at the same place (for example, breakpoint spanning reads). Discordant read-pairs refer to a read and its mate, where the insert size is greater (or less than) the expected distribution of the dataset — or where the mapping orientation of the reads is unexpected (for example, both on the same strand). SoftSearch looks for areas with soft-clipping in the genome that have discordant read pairs supporting the anomaly. Once areas with both these conditions are identified, the read and mate information is extracted directly from the BAM file containing the discordant reads, obviating the need for time-consuming and error-prone complex alignment strategies. Only a small number of soft-masked bases discordant read-pairs are necessary to identify an SV, which on their own would not be sufficient to make an SV call, thus highlighting Soft Search’s improved sensitivity (see Performance).

Authors: Steven N. Hart, Jaysheel Bhavsar, Saurabh Baheti, Vivekananda (Vivek) Sarangi, Jean-Pierre A. Kocher.

Steven N. Hart, Ph.D.

Page last modified: October 24, 2014