TY - GEN
T1 - A NUMA-Aware Provably-Efficient Task-Parallel Platform Based on the Work-First Principle
AU - Deters, Justin
AU - Wu, Jiaye
AU - Xu, Yifan
AU - Lee, I. Ting Angelina
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/12/11
Y1 - 2018/12/11
N2 - Task parallelism is designed to simplify the task of parallel programming. When executing a task parallel program on modern NUMA architectures, it can fail to scale due to the phenomenon called work inflation, where the overall processing time that multiple cores spend on doing useful work is higher compared to the time required to do the same amount of work on one core, due to effects experienced only during parallel executions such as additional cache misses, remote memory accesses, and memory bandwidth issues.One can mitigate work inflation by co-locating the computation with its data, but this is nontrivial to do with task parallel programs. First, by design, the scheduling for task parallel programs is automated, giving the user little control over where the computation is performed. Second, the platforms tend to employ work stealing, which provides strong theoretical guarantees, but its randomized protocol for load balancing does not discern between work items that are far away versus ones that are closer.In this work, we propose NUMA-WS, a NUMA-aware task parallel platform engineered based on the work-first principle. By abiding by the work-first principle, we are able to obtain a platform that is work efficient, provides the same theoretical guarantees as a classic work stealing scheduler, and mitigates work inflation. We have extended Cilk Plus runtime system to implemented NUMA-WS. Empirical results indicate that the NUMA-WS is work efficient and can provide better scalability by mitigating work inflation.
AB - Task parallelism is designed to simplify the task of parallel programming. When executing a task parallel program on modern NUMA architectures, it can fail to scale due to the phenomenon called work inflation, where the overall processing time that multiple cores spend on doing useful work is higher compared to the time required to do the same amount of work on one core, due to effects experienced only during parallel executions such as additional cache misses, remote memory accesses, and memory bandwidth issues.One can mitigate work inflation by co-locating the computation with its data, but this is nontrivial to do with task parallel programs. First, by design, the scheduling for task parallel programs is automated, giving the user little control over where the computation is performed. Second, the platforms tend to employ work stealing, which provides strong theoretical guarantees, but its randomized protocol for load balancing does not discern between work items that are far away versus ones that are closer.In this work, we propose NUMA-WS, a NUMA-aware task parallel platform engineered based on the work-first principle. By abiding by the work-first principle, we are able to obtain a platform that is work efficient, provides the same theoretical guarantees as a classic work stealing scheduler, and mitigates work inflation. We have extended Cilk Plus runtime system to implemented NUMA-WS. Empirical results indicate that the NUMA-WS is work efficient and can provide better scalability by mitigating work inflation.
KW - locality
KW - NUMA
KW - work inflation
KW - work stealing
KW - work-first principle
UR - https://www.scopus.com/pages/publications/85060269978
U2 - 10.1109/IISWC.2018.8573486
DO - 10.1109/IISWC.2018.8573486
M3 - Conference contribution
AN - SCOPUS:85060269978
T3 - 2018 IEEE International Symposium on Workload Characterization, IISWC 2018
SP - 59
EP - 70
BT - 2018 IEEE International Symposium on Workload Characterization, IISWC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Symposium on Workload Characterization, IISWC 2018
Y2 - 30 September 2018 through 2 October 2018
ER -