Exploring lateral genetic transfer among microbial genomes using TF-IDF

Yingnan Cong, Yao-ban Chan, Mark A Ragan



Many microbes can acquire genetic material from their environment and incorporate it into their genome, a process known as lateral genetic transfer (LGT). Computational approaches have been developed to detect genomic regions of lateral origin, but typically lack sensitivity, ability to distinguish donor from recipient, and scalability to very large datasets. To address these issues we have introduced an alignment-free method based on ideas from document analysis, term frequency-inverse document frequency (TF-IDF). Here we examine the performance of TF-IDF on three empirical datasets: 27 genomes of Escherichia coli and Shigella, 110 genomes of enteric bacteria, and 143 genomes across 12 bact..

Funding Acknowledgements

This research was undertaken with the assistance of resources from the National Computational Infrastructure (NCI), which is supported by the Australian Government. YC acknowledges the China Scholarship Council and The University of Queensland for stipend and tuition fee support. The project was funded in part by a grant from the James S. McDonnell Foundation to MAR.