PH-110:Quality control of bacterial single-amplified genome sequences
1Dept. of Life Sci. & Med. Biosci., Waseda Univ., 2CREST, JST
Whole genome amplification (WGA) techniques have enabled us to access unexplored genomic information via sequencing of single-amplified genomes (SAGs). However, WGA of bacteria currently faces challenges due to contamination occurring in sample preparation or from commercially available kits. Although several approaches have been proposed for contamination removal, it is still difficult to completely exclude these “contaminants” from bacterial SAGs. Thus, to increase confidence in analyses of sequenced SAGs, bioinformatical approaches that identify and exclude “contaminant” sequences from SAGs are required. Currently reported approaches have utilized sequenced genomes available from public databases as references. These approaches are effective if one is working with bacterial strains that are similar to currently reported genomes but has limitation when new strains are the target of interest. As a result, a database independent tool is highly significant.
Here, we developed a user-friendly GUI tool for identifying and isolating SAGs from “contaminated” samples. In our method, “non-specific amplified” sequences that are acquired upon WGA are considered as the “contaminants”. Our method calculates the probability that a sequence is a “contaminant” by comparing k-mer frequencies with the “non-specific” amplified sequences. From the results of tests using computationally simulated SAG datasets, the accuracy of our tool to predict “contaminant” sequences was higher than that of currently reported techniques. Subsequently, we applied our tool to real SAG sequences and demonstrated its ability for the prediction. In conclusion, our tool serves as a method that works independently of genome databases for the extraction of SAGs from “contaminated” sequences. We believe that this method will be most effective when employed against SAG sequences of uncultivable or new strains and anticipate that it will contribute to provide new insights on our interpretation of SAGs.
keywords:Single-cell genomics,Decontamination,Software