Emergent Failures: Rethinking Cloud Reliability at Scale
Peter Garraghan, Renyu Yang, Zhenyu Wen, Alexander Romanovsky, Jie Xu, Rajkumar Buyya, Rajiv Ranjan
IEEE CLOUD COMPUTING | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC | Published : 2018
Since the conception of cloud computing, ensuring its ability to provide highly reliable service has been of the upmost importance and criticality to the business objectives of providers and their customers. This has held true for every facet of the system, encompassing applications, resource management, the underlying computing infrastructure, and environmental cooling. Thus, the cloud-computing and dependability research communities have exerted considerable effort toward enhancing the reliability of system components against various software and hardware failures. However, as these systems have continued to grow in scale, with heterogeneity and complexity resulting in the manifestation of..View full abstract
Awarded by UK Engineering and Physical Sciences Research Council
Awarded by National Key Research and Development Program of China
This work is supported by the UK Engineering and Physical Sciences Research Council (EP/P031617/1) and the National Key Research and Development Program of China (2016YFB1000103). Renyu Yang is the corresponding author for this article.