Towards Zero Spawn Overhead: Work Stealing Without Deques

  • Aaron Handleman
  • , Kyle Singer
  • , Tao B. Schardl
  • , I. Ting Angelina Lee

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In a randomized work-stealing scheduler, parallel speedup depends on the spawn overhead, which workers pay to allow tasks to execute in parallel, and the steal overhead, which thieves pay to start executing new work. The importance of minimizing the spawn overhead in a randomized work-stealing scheduler is first formalized by Frigo et al., coined as the work-first principle [15], which states that one should minimize spawn overhead even at the expense of a larger steal overhead. Since then, many strategies have been proposed to reduce the spawn overhead, which is dominated by maintaining a per-worker double-ended queue, or deque, to keep track of available parallel work. In pursuit of zero spawn overhead, this work considers a strategy that eliminates the use of deques entirely, obviating the need for a worker to perform explicit bookkeeping or set up a deque to enable parallelism. To that end, we propose DLite, a compiler and runtime ABI (Application Binary Interface) that incurs near-zero spawn overhead, empirically measured to be about 6% compared to a regular function invocation. DLite pushes the tradeoffs advocated by the work-first principle to the extreme, which decreases the spawn overhead to almost nil, at the expense of a high steal cost. Specifically, DLite employs a backtracking strategy: When a steal attempt occurs, the victim provides its current stack and base pointers to the thief, and the thief then reconstructs the necessary state to realize the parallel execution. We have implemented Cilk-DLite, which extends the OpenCilk platform [33] to implement DLite. When the application has ample parallelism, Cilk-DLite exhibits similar scalability to OpenCilk with much lower spawn overhead. When the application lacks parallelism, the high steal cost in Cilk-DLite can impede scalability due to slower work distribution. We also implemented variants of Cilk-DLite that make different design choices to evaluate the tradeoffs between spawn overhead and steal cost.

Original languageEnglish
Title of host publicationSPAA 2025 - Proceedings of the 2025 37th ACM Symposium on Parallelism in Algorithms and Architectures
PublisherAssociation for Computing Machinery
Pages75-88
Number of pages14
ISBN (Electronic)9798400712586
DOIs
StatePublished - Jul 16 2025
Event37th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2025 - Portland, United States
Duration: Jul 28 2025Aug 1 2025

Publication series

NameAnnual ACM Symposium on Parallelism in Algorithms and Architectures
ISSN (Print)1548-6109

Conference

Conference37th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2025
Country/TerritoryUnited States
CityPortland
Period07/28/2508/1/25

Fingerprint

Dive into the research topics of 'Towards Zero Spawn Overhead: Work Stealing Without Deques'. Together they form a unique fingerprint.

Cite this