Journal article

Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources

Dietrich Rebholz-Schuhmann, Senay Kafkas, Jee-Hyub Kim, Chen Li, Antonio Jimeno Yepes, Robert Hoehndorf, Rolf Backofen, Ian Lewin

Journal of Biomedical Semantics | BMC | Published : 2013


MOTIVATION: The identification of protein and gene names (PGNs) from the scientific literature requires semantic resources: Terminological and lexical resources deliver the term candidates into PGN tagging solutions and the gold standard corpora (GSC) train them to identify term parameters and contextual features. Ideally all three resources, i.e. corpora, lexica and taggers, cover the same domain knowledge, and thus support identification of the same types of PGNs and cover all of them. Unfortunately, none of the three serves as a predominant standard and for this reason it is worth exploring, how these three resources comply with each other. We systematically compare different PGN taggers ..

View full abstract


Awarded by EU Support Action grant ("CALBC") under the 7th EU Framework Programme within Theme "Intelligent Content and Semantics"

Funding Acknowledgements

This work was funded by the EU Support Action grant 231727 ("CALBC", under the 7th EU Framework Programme within Theme "Intelligent Content and Semantics" (ICT 2007.4.2).National ICT Australia (NICTA) is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program.