Generating Diverse Clustering Datasets with Targeted Characteristics

LH dos Santos Fernandes; K Smith-Miles; AC Lorena

Book Chapter

Generating Diverse Clustering Datasets with Targeted Characteristics

LH dos Santos Fernandes, K Smith-Miles, AC Lorena

Intelligent Systems | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | SPRINGER INTERNATIONAL PUBLISHING AG | Published : 2022

DOI: 10.1007/978-3-031-21686-2_28

Abstract

When evaluating clustering algorithms, it is important to assess their performance in retrieving clusters of datasets with known structures. Nonetheless, generating and choosing diverse datasets to compose such test benchmarks is non-trivial. The datasets must present a large variety of structures and characteristics so that the algorithms can be challenged and their strengths and weaknesses can be revealed. The use of generators currently available in the literature relies on trial and error procedures that can be quite costly and inaccurate. Taking advantage of an Instance Space Analysis of popular clustering benchmarks, where datasets are projected into a 2-D embedding with linear trends ..

View full abstract