On normalization and algorithm selection for unsupervised outlier detection

S Kandanaarachchi; MA Muñoz; RJ Hyndman; K Smith-Miles

Journal article

On normalization and algorithm selection for unsupervised outlier detection

S Kandanaarachchi, MA Muñoz, RJ Hyndman, K Smith-Miles

Data Mining and Knowledge Discovery | SPRINGER | Published : 2020

DOI: 10.1007/s10618-019-00661-z

Abstract

This paper demonstrates that the performance of various outlier detection methods is sensitive to both the characteristics of the dataset, and the data normalization scheme employed. To understand these dependencies, we formally prove that normalization affects the nearest neighbor structure, and density of the dataset; hence, affecting which observations could be considered outliers. Then, we perform an instance space analysis of combinations of normalization and detection methods. Such analysis enables the visualization of the strengths and weaknesses of these combinations. Moreover, we gain insights into which method combination might obtain the best performance for a given dataset.

University of Melbourne Researchers

Mario Andres Munoz Author

Kate Smith-Miles Author

Related Projects (2)

STRESS-TESTING ALGORITHMS: GENERATING NEW TEST INSTANCES TO ELICIT INSIGHTS

This project aims to develop a new paradigm in algorithm testing, creating novel test instances and tools to elicit insights into algorithm ..

INTRUDER ALERT! DETECTING AND CLASSIFYING EVENTS IN NOISY TIME SERIES

This project aims to address the mathematical challenges in automated early detection and classification of intrusion events in noisy time s..

Grants

Awarded by Australian Research Council

Funding Acknowledgements

Funding was provided by the Australian Research Council through the Australian Laureate Fellowship FL140100012, and Linkage Project LP160101885. This research was supported in part by the Monash eResearch Centre and eSolutions-Research Support Services through the MonARCH HPC Cluster.