Journal article
Distributed stream clustering using micro-clusters on Apache Storm
P Karunaratne, S Karunasekera, A Harwood
Journal of Parallel and Distributed Computing | ACADEMIC PRESS INC ELSEVIER SCIENCE | Published : 2017
Abstract
The recent need to extract real-time insights from data has driven the need for machine learning algorithms that can operate on data streams. Given the current extreme rates of data generation (around 5000 messages per second), these algorithms need to be able to handle data streams of very high velocity. Many current algorithms do not reach this requirement, in some cases processing only tens of messages per second. In this work we address the problem of limited achievable throughput of stream clustering by developing scalable distributed algorithms based on the micro-clustering paradigm that run on cloud platforms. We present two distributed architectures to execute the algorithms in paral..
View full abstractGrants
Funding Acknowledgements
This work was supported by the University of Melbourne. The experiments were run on the NeCTAR (National eResearch Collaboration Tools and Resources) cloud computing platform.