Journal article
Practical implementation of an existing smoking detection pipeline and reduced support vector machine training corpus requirements
R Khor, WK Yip, M Bresse, W Rose, G Duchesne, F Foroudi
Journal of the American Medical Informatics Association | Published : 2014
Abstract
This study aimed to reduce reliance on large training datasets in support vector machine (SVM)-based clinical text analysis by categorizing keyword features. An enhanced Mayo smoking status detection pipeline was deployed. We used a corpus of 709 annotated patient narratives. The pipeline was optimized for local data entry practice and lexicon. SVM classifier retraining used a grouped keyword approach for better efficiency. Accuracy, precision, and F-measure of the unaltered and optimized pipelines were evaluated using k-fold crossvalidation. Initial accuracy of the clinical Text Analysis and Knowledge Extraction System (cTAKES) package was 0.69. Localization and keyword grouping improved sy..
View full abstract