Conference Proceedings

Establishing a baseline for literature mining human genetic variants and their relationships to disease cohorts

Karin M Verspoor, Go Eun Heo, Keun Young Kang, Min Song

BMC Medical Informatics and Decision Making | BIOMED CENTRAL LTD | Published : 2016

Abstract

BACKGROUND: The Variome corpus, a small collection of published articles about inherited colorectal cancer, includes annotations of 11 entity types and 13 relation types related to the curation of the relationship between genetic variation and disease. Due to the richness of these annotations, the corpus provides a good testbed for evaluation of biomedical literature information extraction systems. METHODS: In this paper, we focus on assessing performance on extracting the relations in the corpus, using gold standard entities as a starting point, to establish a baseline for extraction of relations important for extraction of genetic variant information from the literature. We test the applic..

View full abstract

Grants

Awarded by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) - Ministry of Science, ICT and Future Planning


Awarded by Australian Research Council


Funding Acknowledgements

This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (NRF-2012M3C4A7033342). KMV was supported by the Australian Research Council, under project DP150101550.