Journal article

A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools

Karin Verspoor, Kevin Bretonnel Cohen, Arrick Lanfranchi, Colin Warner, Helen L Johnson, Christophe Roeder, Jinho D Choi, Christopher Funk, Yuriy Malenkiy, Miriam Eckert, Nianwen Xue, William A Baumgartner, Michael Bada, Martha Palmer, Lawrence E Hunter

BMC Bioinformatics | BMC | Published : 2012

Abstract

BACKGROUND: We introduce the linguistic annotation of a corpus of 97 full-text biomedical publications, known as the Colorado Richly Annotated Full Text (CRAFT) corpus. We further assess the performance of existing tools for performing sentence splitting, tokenization, syntactic parsing, and named entity recognition on this corpus. RESULTS: Many biomedical natural language processing systems demonstrated large differences between their previously published results and their performance on the CRAFT corpus when tested with the publicly available models or rule sets. Trainable systems differed widely with respect to their ability to build high-performing models based on this data. CONCLUSIONS:..

View full abstract

Grants

Awarded by NIH


Awarded by NIH/NCRR Colorado CTSI


Awarded by NATIONAL CENTER FOR RESEARCH RESOURCES


Awarded by NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES


Awarded by NATIONAL LIBRARY OF MEDICINE


Funding Acknowledgements

This work was supported by NIH grants R01LM009254, R01GM083649, and R01LM008111 to Lawrence E. Hunter and in part by NIH/NCRR Colorado CTSI Grant Number UL1 RR025780. We gratefully acknowledge the important work of our syntactic annotation team, supervised by Martha Palmer: Arrick Lanfranchi, Colin Warner, Amanda Howard, Tim O'Gorman, Kevin Gould, and Michael Regan. We also greatly appreciate the assistance of Bob Leaman, David McClosky, Spence Green, and Christopher Manning with questions that arose while working with their tools, and David Weitzenkamp who assisted with the statistical analyses. We also thank the anonymous reviewers for their meaningful feedback on the manuscript.