The structural and content aspects of abstracts versus bodies of full text journal articles are different
K Bretonnel Cohen, Helen L Johnson, Karin Verspoor, Christophe Roeder, Lawrence E Hunter
BMC Bioinformatics | BMC | Published : 2010
BACKGROUND: An increase in work on the full text of journal articles and the growth of PubMedCentral have the opportunity to create a major paradigm shift in how biomedical text mining is done. However, until now there has been no comprehensive characterization of how the bodies of full text journal articles differ from the abstracts that until now have been the subject of most biomedical text mining research. RESULTS: We examined the structural and linguistic aspects of abstracts and bodies of full text articles, the performance of text mining tools on both, and the distribution of a variety of semantic classes of named entities between them. We found marked structural differences, with lon..View full abstract
Awarded by NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
Awarded by NATIONAL LIBRARY OF MEDICINE
This work was supported by grants R01LM009254, R01GM083649, and R01LM008111 to Lawrence Hunter, and grant 1 R01LM010120-D1 to Karin Verspoor. We thank Irina Temnikova and Gondy Leroy for discussions of sentence complexity and readability. Anis Karimpour-Fard and Katerina Kechris provided invaluable statistical consultation. Kevin Livingston provided helpful discussion, and Tom Christiansen provided helpful discussion and editing. We also thank Graciela Gonzalez and Bob Leaman for the prerelease BANNER disease model and Lawrence Smith for help with MEDPOST.