TY - GEN
T1 - Enabling real-time irregular data-flow pipelines on SIMD Devices
AU - Plano, Tom
AU - Buhler, Jeremy
N1 - Funding Information:
This work was supported by NSF award CNS-1763503.
Publisher Copyright:
© 2021 Owner/Author.
PY - 2021/8/9
Y1 - 2021/8/9
N2 - Streaming data-flow applications arise in many contexts where each item in a data stream must be processed within a bounded latency, or deadline, following its arrival. We consider applications whose behavior is irregular, in the sense that the application may reduce or amplify data volumes dynamically at various stages of its computation. Our implementation target for these applications is SIMD-capable processors such as GPUs. For such devices, organizing the computation so that a full-width SIMD vector of inputs can be processed at once makes efficient use of the processor. However, having parts of the computation wait while full vectors of input accumulate may cause the application to miss deadlines. We present a novel approach to scheduling irregular streaming applications with latency constraints on SIMD devices. After describing a model for executing such applications, we formalize the objective of efficient processor utilization and the constraints associated with bounded latency and sufficient throughput to handle a stream of items arriving at a fixed rate. We introduce a strategy, enforced waits, to optimize the objective subject to the constraints. We demonstrate empirically that, for a test application from bioinformatics, our strategy can lower processor utilization relative to a baseline approach that cannot introduce waits inside the application pipeline. Finally, we characterize the region of parameter space in which the new approach is likely to outperform the baseline.
AB - Streaming data-flow applications arise in many contexts where each item in a data stream must be processed within a bounded latency, or deadline, following its arrival. We consider applications whose behavior is irregular, in the sense that the application may reduce or amplify data volumes dynamically at various stages of its computation. Our implementation target for these applications is SIMD-capable processors such as GPUs. For such devices, organizing the computation so that a full-width SIMD vector of inputs can be processed at once makes efficient use of the processor. However, having parts of the computation wait while full vectors of input accumulate may cause the application to miss deadlines. We present a novel approach to scheduling irregular streaming applications with latency constraints on SIMD devices. After describing a model for executing such applications, we formalize the objective of efficient processor utilization and the constraints associated with bounded latency and sufficient throughput to handle a stream of items arriving at a fixed rate. We introduce a strategy, enforced waits, to optimize the objective subject to the constraints. We demonstrate empirically that, for a test application from bioinformatics, our strategy can lower processor utilization relative to a baseline approach that cannot introduce waits inside the application pipeline. Finally, we characterize the region of parameter space in which the new approach is likely to outperform the baseline.
KW - Bounded latency
KW - Gpgpu
KW - Irregular data-flow
KW - Scheduling
UR - http://www.scopus.com/inward/record.url?scp=85115983921&partnerID=8YFLogxK
U2 - 10.1145/3458744.3473367
DO - 10.1145/3458744.3473367
M3 - Conference contribution
AN - SCOPUS:85115983921
T3 - ACM International Conference Proceeding Series
BT - 50th International Conference on Parallel Processing Workshop, ICPP 2021 - Proceedings
PB - Association for Computing Machinery
T2 - 50th International Conference on Parallel Processing Workshop, ICPP 2021
Y2 - 9 August 2021 through 12 August 2021
ER -