Journal article

Emergent Failures: Rethinking Cloud Reliability at Scale

Peter Garraghan, Renyu Yang, Zhenyu Wen, Alexander Romanovsky, Jie Xu, Rajkumar Buyya, Rajiv Ranjan

IEEE CLOUD COMPUTING | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC | Published : 2018

Abstract

Since the conception of cloud computing, ensuring its ability to provide highly reliable service has been of the upmost importance and criticality to the business objectives of providers and their customers. This has held true for every facet of the system, encompassing applications, resource management, the underlying computing infrastructure, and environmental cooling. Thus, the cloud-computing and dependability research communities have exerted considerable effort toward enhancing the reliability of system components against various software and hardware failures. However, as these systems have continued to grow in scale, with heterogeneity and complexity resulting in the manifestation of..

View full abstract

Grants

Awarded by UK Engineering and Physical Sciences Research Council


Awarded by National Key Research and Development Program of China


Funding Acknowledgements

This work is supported by the UK Engineering and Physical Sciences Research Council (EP/P031617/1) and the National Key Research and Development Program of China (2016YFB1000103). Renyu Yang is the corresponding author for this article.