Journal article

Selective sampling for approximate clustering of very large data sets

Liang Wang, James C Bezdek, Christopher Leckie, Ramamohanarao Kotagiri

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS | WILEY | Published : 2008

Abstract

A key challenge in pattern recognition is how to scale the computational efficiency of clustering algorithms on large data sets. The extension of non-Euclidean relational fuzzy c-means (NERF) clustering to very large (VL = unloadable) relational data is called the extended NERF (eNERF) clustering algorithm, which comprises four phases: (i) finding distinguished features that monitor progressive sampling; (ii) progressively sampling from a N × N relational matrix RN to obtain a n × n sample matrix R n; (iii) clustering Rn with literal NERF; and (iv) extending the clusters in Rn to the remainder of the relational data. Previously published examples on several fairly small data sets suggest tha..

View full abstract