Feature engineering for MEDLINE citation categorization with MeSH
A JIMENO YEPES, L Plaza, J Carrillo-de-Albornoz, JG Mork, AR Aronson
BMC Bioinformatics | BioMed Central | Published : 2015
BACKGROUND: Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings. In addition to unigrams and bigrams, these features include noun phrases, citation meta-data, citation structure, and semantic annotation of the citations. RESULTS: Traditional features like unigrams and bigrams exhibit strong performance compared to other feature sets. Little or..View full abstract
Awarded by Spanish Ministry of Science and Innovation
Awarded by NATIONAL LIBRARY OF MEDICINE
In carrying out this research, we have received funding from the University of Melbourne. This work was supported in part by the Intramural Research Program of the NIH, National Library of Medicine. This research was partially supported by the Spanish Ministry of Science and Innovation (Holopedia Project, TIN2010-21128-C02-01, VoxPopuli Project, TIN2013-47090-C3-1-P).