Given the unpredictable nature of the nodes in distributed computing systems, some of the tasks can be significantly delayed. Such delayed tasks are called stragglers. In order to mitigate stragglers, redundancy in computation is often employed by encoding k tasks to n tasks such that any k of them can help ascertain the completion of the tasks. Two important metrics of interest are service completion time of the k tasks, and server utilization cost which is sum of time each server spends working on the tasks. Even though starting all n jobs at the start (t = 0) leads to lower mean service completion time, it leads to higher mean server utilization cost. We consider a proactive straggler mitigation strategy where n0 <= n tasks are started at t = 0 while the remaining n − n0 tasks are launched when l0 <= min(n0, k) tasks finish. The tasks are halted when k tasks finish. This gives a flexible forking strategy with multiple parameters. We analyze the mean of two performance metrics for the proposed forking strategy when the random task completion time at each server is independent and distributed as a shifted exponential. This talk demonstrates an effective algorithm to find the tradeoff between the two performance metrics mean server utilization cost and mean service completion time so as to choose efficient choice of parameters. This work has been accepted at INFOCOM-2020 conference.
Ajay Badita is a PhD student in the department of ECE, IISc-Bengaluru, working under the supervision of Prof. Parimal Parag.