Conference Proceedings

An investigation into the interaction between feature selection and discretization: Learning how and when to read numbers

Sumukh Ghodke, Timothy Baldwin, MA Orgun (ed.), J Thornton (ed.)

AI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS | SPRINGER-VERLAG BERLIN | Published : 2007

Abstract

Pre-processing is an important part of machine learning, and has been shown to significantly improve the performance of classifiers. In this paper, we take a selection of pre-processing methods-focusing specifically on discretization and feature selection-and empirically examine their combined effect on classifier performance. In our experiments, we take 11 standard datasets and a selection of standard machine learning algorithms, namely one-R, ID3, naive Bayes, and IB1, and explore the impact of different forms of preprocessing on each combination of dataset and algorithm. We find that in general the combination of wrapper-based forward selection and naive supervised methods of discretizati..

View full abstract