Journal article
Comparing Medline citations using modified N-grams
RMA Nawab, M Stevenson, P Clough
Journal of the American Medical Informatics Association | OXFORD UNIV PRESS | Published : 2014
Abstract
Objective We aim to identify duplicate pairs of Medline citations, particularly when the documents are not identical but contain similar information. Materials and methods Duplicate pairs of citations are identified by comparing word n-grams in pairs of documents. N-grams are modified using two approaches which take account of the fact that the document may have been altered. These are: (1) deletion, an item in the n-gram is removed; and (2) substitution, an item in the n-gram is substituted with a similar term obtained from the Unified Medical Language System Metathesaurus. N-grams are also weighted using a score derived from a language model. Evaluation is carried out using a set of 520 Me..
View full abstractGrants
Funding Acknowledgements
The COMSATS Institute of Information Technology, Islamabad, Pakistan funded this work under the Faculty Development Program.