Natural language processing for automated validation of protein databases
Grant number: DP150101550 | Funding period: 2015 - 2018
The project aims to use natural language processing and information retrieval to reconcile and improve sources of biological information. Biological research has produced vast volumes of information about proteins, captured in structured resources (databases) and unstructured documents. However, the accuracy of much of this information is questionable. The project proposes to develop methods to validate data and reduce the dramatic inconsistencies in protein information resources by leveraging observed correlations and complementarity between them, and specifically through targeted fact extraction from the biomedical literature. These methods will be applied at scale across millions of publi..View full description
Related publications (19)
Quality Matters: Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases.
Qingyu Chen, Ramona Britto, Ivan Erill, Constance J Jeffery, Arthur Liberzon, Michele Magrane, Jun-Ichi Onami, Marc Robinson-Rechavi, Jana Sponarova, Justin Zobel, Karin Verspoor
Biological databases represent an extraordinary collective volume of work. Diligently built up over decades and comprising many mi..
Comparative Analysis of Sequence Clustering Methods for Deduplication of Biological Databases
Qingyu Chen, Yu Wan, Xiuzhen Zhang, Yang Lei, Justin Zobel, Karin Verspoor
The massive volumes of data in biological sequence databases provide a remarkable resource for large-scale biological studies. How..
An expanded evaluation of protein function prediction methods shows an improvement in accuracy
Yuxiang Jiang, Tal Ronnen Oron, Wyatt T Clark, Asma R Bankapur, Daniel D'Andrea, Rosalba Lepore, Christopher S Funk, Indika Kahanda, Karin M Verspoor, Asa Ben-Hur, Da Chen Emily Koo, Duncan Penfold-Brown, Dennis Shasha, Noah Youngs, Richard Bonneau, Alexandra Lin, Sayed ME Sahraeian, Pier Luigi Martelli, Giuseppe Profiti, Rita Casadio