Efficient large-scale protein sequence comparison and gene matching to identify orthologs and co-orthologs

K Mahmood; GI Webb; J Song; JC Whisstock; AS Konagurthu

Journal article

Efficient large-scale protein sequence comparison and gene matching to identify orthologs and co-orthologs

K Mahmood, GI Webb, J Song, JC Whisstock, AS Konagurthu

Nucleic Acids Research | OXFORD UNIV PRESS | Published : 2012

DOI: 10.1093/nar/gkr1261

Download PDF

Abstract

Broadly, computational approaches for ortholog assignment is a three steps process: (i) identify all putative homologs between the genomes, (ii) identify gene anchors and (iii) link anchors to identify best gene matches given their order and context. In this article, we engineer two methods to improve two important aspects of this pipeline [specifically steps (ii) and (iii)]. First, computing sequence similarity data [step (i)] is a computationally intensive task for large sequence sets, creating a bottleneck in the ortholog assignment pipeline. We have designed a fast and highly scalable sort-join method (afree) based on k-mer counts to rapidly compare all pairs of sequences in a large prot..

View full abstract

University of Melbourne Researchers

Khalid Mahmood Author

Jiangning Song Author

Grants

Funding Acknowledgements

The authors acknowledge: Australian Research Council (ARC) Centre of Excellence in Structural Functional Microbial Genomics for support; Monash e-Research Centre and the Victorian Bioinformatics Consortium for computational resources. K.M. is a PhD Student supported by ARC scholarship. J.S.'s research is funded by National Health and Medical Research Council (NHMRC) Peter Doherty Fellowship. J.C.W. is an ARC Federation Fellow and Honorary NHMRC Principle Research Fellow. A.S.K's research is supported by Monash Larkins Fellowship. Funding for open access charge: Australian Research Council.