Conference Proceedings

Online K-Means Clustering with Lightweight Coresets

JS Low, Z Ghafoori, C Leckie

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Springer | Published : 2019

Abstract

Coresets are representative samples of data that can be used to train machine learning models with provable guarantees of approximating the accuracy of training on the full data set. They have been used for scalable clustering of large datasets and result in better cluster partitions compared to clustering a random sample. In this paper, we present a novel approach of constructing lightweight coresets on subsets of data that can fit in memory while performing a streaming variant of k-means clustering known as online k-means. Experimental results show that this approach generates cluster partitions of comparable accuracy to the regular online k-means algorithm in less time, or superior partit..

View full abstract