Journal article
Efficient large-scale protein sequence comparison and gene matching to identify orthologs and co-orthologs
K Mahmood, GI Webb, J Song, JC Whisstock, AS Konagurthu
Nucleic Acids Research | OXFORD UNIV PRESS | Published : 2012
DOI: 10.1093/nar/gkr1261
Abstract
Broadly, computational approaches for ortholog assignment is a three steps process: (i) identify all putative homologs between the genomes, (ii) identify gene anchors and (iii) link anchors to identify best gene matches given their order and context. In this article, we engineer two methods to improve two important aspects of this pipeline [specifically steps (ii) and (iii)]. First, computing sequence similarity data [step (i)] is a computationally intensive task for large sequence sets, creating a bottleneck in the ortholog assignment pipeline. We have designed a fast and highly scalable sort-join method (afree) based on k-mer counts to rapidly compare all pairs of sequences in a large prot..
View full abstractGrants
Funding Acknowledgements
The authors acknowledge: Australian Research Council (ARC) Centre of Excellence in Structural Functional Microbial Genomics for support; Monash e-Research Centre and the Victorian Bioinformatics Consortium for computational resources. K.M. is a PhD Student supported by ARC scholarship. J.S.'s research is funded by National Health and Medical Research Council (NHMRC) Peter Doherty Fellowship. J.C.W. is an ARC Federation Fellow and Honorary NHMRC Principle Research Fellow. A.S.K's research is supported by Monash Larkins Fellowship. Funding for open access charge: Australian Research Council.