Combining heterogeneous data sources for accurate functional annotation of proteins
Artem Sokolov, Christopher Funk, Kiley Graim, Karin Verspoor, Asa Ben-Hur
BMC Bioinformatics | BMC | Published : 2013
Combining heterogeneous sources of data is essential for accurate prediction of protein function. The task is complicated by the fact that while sequence-based features can be readily compared across species, most other data are species-specific. In this paper, we present a multi-view extension to GOstruct, a structured-output framework for function annotation of proteins. The extended framework can learn from disparate data sources, with each data source provided to the framework in the form of a kernel. Our empirical results demonstrate that the multi-view framework is able to utilize all available information, yielding better performance than sequence-based models trained across species a..View full abstract
Awarded by NSF
Awarded by NIH
Awarded by Direct For Biological Sciences
This work was funded by NSF grants DBI-0965616 and DBI-0965768. Chris Funk is supported by NIH training grant T15 LM00945102. NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program.