Straggler Mitigation for Distributed Behavioral Simulation

Eman Bin Khunayn, Shanika Karunasekera, Hairuo Xie, Kotagiri Ramamohanarao, K Lee (ed.), L Liu (ed.)

International Conference on Distributed Computing Systems | IEEE COMPUTER SOC | Published : 2017


Running large-scale behavioral simulations requires high computational power, which can be acquired by distributing computation workload to multiple computing nodes (i.e., workers) that run in parallel. The implementations of such systems commonly follow the Bulk Synchronous Parallel (BSP) model. However, implementations using BSP usually suffer from the straggler problem, where the delay of any worker slows down the entire simulation. The problem usually occurs due to communication delays or imbalanced workload among workers. To mitigate the straggler problem, we propose a novel parallel computational model, called Priority Synchronous Parallel (PSP) model. PSP exploits data dependencies of..

