TY - GEN
T1 - Helper locks for fork-join parallel programming
AU - Agrawal, Kunal
AU - Leiserson, Charles E.
AU - Sukha, Jim
PY - 2010/1/9
Y1 - 2010/1/9
N2 - Helper locks allow programs with large parallel critical sections, called parallel regions, to execute more efficiently by enlisting processors that might otherwise be waiting on the helper lock to aid in the execution of the parallel region. Suppose that a processor p is executing a parallel region A after having acquired the lock L protecting A. If another processor p′ tries to acquire L, then instead of blocking and waiting for p to complete A, processor p′ joins p to help it complete A. Additional processors not blocked on L may also help to execute A. The HELPER runtime system can execute fork-join computations augmented with helper locks and parallel regions. HELPER supports the unbounded nesting of parallel regions. We provide theoretical completion-time and space-usage bounds for a design of HELPER based on work stealing. Specifically, let V be the number of parallel regions in a computation, let T1 be its work, and let T̃∞ be its "aggregate span" - the sum of the spans (critical-path lengths) of all its parallel regions. We prove that HELPER completes the computation in expected time O(T1/PP + T̃∞ + PV) on P processors. This bound indicates that programs with a small number of highly parallel critical sections can attain linear speedup. For the space bound, we prove that HELPER completes a program using only O(PS1 stack space, where S1 is the sum, over all regions, of the stack space used by each region in a serial execution. Finally, we describe a prototype of HELPER implemented by modifying the Cilk multithreaded runtime system. We used this prototype to implement a concurrent hash table with a resize operation protected by a helper lock.
AB - Helper locks allow programs with large parallel critical sections, called parallel regions, to execute more efficiently by enlisting processors that might otherwise be waiting on the helper lock to aid in the execution of the parallel region. Suppose that a processor p is executing a parallel region A after having acquired the lock L protecting A. If another processor p′ tries to acquire L, then instead of blocking and waiting for p to complete A, processor p′ joins p to help it complete A. Additional processors not blocked on L may also help to execute A. The HELPER runtime system can execute fork-join computations augmented with helper locks and parallel regions. HELPER supports the unbounded nesting of parallel regions. We provide theoretical completion-time and space-usage bounds for a design of HELPER based on work stealing. Specifically, let V be the number of parallel regions in a computation, let T1 be its work, and let T̃∞ be its "aggregate span" - the sum of the spans (critical-path lengths) of all its parallel regions. We prove that HELPER completes the computation in expected time O(T1/PP + T̃∞ + PV) on P processors. This bound indicates that programs with a small number of highly parallel critical sections can attain linear speedup. For the space bound, we prove that HELPER completes a program using only O(PS1 stack space, where S1 is the sum, over all regions, of the stack space used by each region in a serial execution. Finally, we describe a prototype of HELPER implemented by modifying the Cilk multithreaded runtime system. We used this prototype to implement a concurrent hash table with a resize operation protected by a helper lock.
KW - Cilk
KW - Fork-join multithreading
KW - Helper lock
KW - Nested parallelism
KW - Parallel region
KW - Scheduling
KW - Work stealing
UR - https://www.scopus.com/pages/publications/77749274261
U2 - 10.1145/1693453.1693487
DO - 10.1145/1693453.1693487
M3 - Conference contribution
AN - SCOPUS:77749274261
SN - 9781605587080
T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
SP - 245
EP - 256
BT - PPoPP'10 - Proceedings of the 2010 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
PB - Association for Computing Machinery
T2 - 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010
Y2 - 9 January 2010 through 14 January 2010
ER -