Journal article

Combining heterogeneous data sources for accurate functional annotation of proteins

Artem Sokolov, Christopher Funk, Kiley Graim, Karin Verspoor, Asa Ben-Hur

BMC Bioinformatics | BMC | Published : 2013


Combining heterogeneous sources of data is essential for accurate prediction of protein function. The task is complicated by the fact that while sequence-based features can be readily compared across species, most other data are species-specific. In this paper, we present a multi-view extension to GOstruct, a structured-output framework for function annotation of proteins. The extended framework can learn from disparate data sources, with each data source provided to the framework in the form of a kernel. Our empirical results demonstrate that the multi-view framework is able to utilize all available information, yielding better performance than sequence-based models trained across species a..

View full abstract


Awarded by NSF

Awarded by NIH

Awarded by Direct For Biological Sciences

Funding Acknowledgements

This work was funded by NSF grants DBI-0965616 and DBI-0965768. Chris Funk is supported by NIH training grant T15 LM00945102. NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program.