Adding data parallelism to streaming pipelines for throughput optimization

Peng Li, Kunal Agrawal, Jeremy Buhler, Roger D. Chamberlain

Research output: Contribution to conferencePaperpeer-review

6 Scopus citations


The streaming model is a popular model for writing high-throughput parallel applications. A streaming application is represented by a graph of computation stages that communicate with each other via FIFO channels. In this paper, we consider the problem of mapping streaming pipelines - streaming applications where the graph is a linear chain - onto a set of computing resources in order to maximize its throughput. In a parallel setting, subsets of stages, called components, can be mapped onto different computing resources. The throughput of an application is determined by the throughput of the slowest component. Therefore, if some stage is much slower than others, then it may be useful to replicate the stage's code and divide its workload among two or more replicas in order to increase throughput. However, pipelines may consist of some replicable and some non-replicable stages. In this paper, we address the problem of mapping these partially replicable streaming pipelines onto both homogeneous and heterogeneous platforms so as to maximize throughput. We consider two types of platforms, homogeneous platforms - where all resources are identical, and heterogeneous platforms - where resources may have different speeds. In both cases, we consider two network topologies - unidirectional chain and clique. We provide polynomial-time algorithms for mapping partially replicable pipelines onto unidirectional chains for both homogeneous and heterogeneous platforms. For homogeneous platforms, the algorithm for unidirectional chains generalizes to clique topologies. However, for heterogeneous platforms, mapping these pipelines onto clique topologies is NP-complete. We provide heuristics to generate solutions for cliques by applying our chain algorithms to a series of chains sampled from the clique. Our empirical results show that these heuristics rapidly converge to near-optimal solutions.

Original languageEnglish
Number of pages10
StatePublished - 2013
Event20th Annual International Conference on High Performance Computing, HiPC 2013 - Bangalore, India
Duration: Dec 18 2013Dec 21 2013


Conference20th Annual International Conference on High Performance Computing, HiPC 2013


Dive into the research topics of 'Adding data parallelism to streaming pipelines for throughput optimization'. Together they form a unique fingerprint.

Cite this