TY - GEN
T1 - SuperCut
T2 - 20th ACM International Conference on Computing Frontiers, CF 2023
AU - Zhao, Chenfeng
AU - Chamberlain, Roger D.
AU - Zhang, Xuan
N1 - Publisher Copyright:
© 2023 Owner/Author.
PY - 2023/5/9
Y1 - 2023/5/9
N2 - The parallel execution of many graph algorithms is frequently dominated by data communication overheads between compute nodes. This bottleneck becomes even more pronounced in Near-Memory Processing (NMP) architectures with multiple memory cubes as local memory accesses are less expensive. Existing near-memory architectures typically use graph partitioning methods with a fixed vertex assignment, which limits their potential to improve performance and reduce energy consumption. Here, we argue that an NMP-based graph processing system should also consider the distribution of vertices onto memory cubes. We propose SuperCut, a framework for near-memory architectures to effectively reduce communication overheads while maintaining computational balance. We evaluate SuperCut via architectural simulation with 6 real-world datasets and 4 representative applications. The results show that it provides up to 1.8x total energy reduction and 2.6x speedup relative to current state-of-the-art approaches.
AB - The parallel execution of many graph algorithms is frequently dominated by data communication overheads between compute nodes. This bottleneck becomes even more pronounced in Near-Memory Processing (NMP) architectures with multiple memory cubes as local memory accesses are less expensive. Existing near-memory architectures typically use graph partitioning methods with a fixed vertex assignment, which limits their potential to improve performance and reduce energy consumption. Here, we argue that an NMP-based graph processing system should also consider the distribution of vertices onto memory cubes. We propose SuperCut, a framework for near-memory architectures to effectively reduce communication overheads while maintaining computational balance. We evaluate SuperCut via architectural simulation with 6 real-world datasets and 4 representative applications. The results show that it provides up to 1.8x total energy reduction and 2.6x speedup relative to current state-of-the-art approaches.
KW - 3D-stacked memory
KW - graph processing
KW - near-data processing
UR - https://www.scopus.com/pages/publications/85169556101
U2 - 10.1145/3587135.3592209
DO - 10.1145/3587135.3592209
M3 - Conference contribution
AN - SCOPUS:85169556101
T3 - Proceedings of the 20th ACM International Conference on Computing Frontiers 2023, CF 2023
SP - 42
EP - 51
BT - Proceedings of the 20th ACM International Conference on Computing Frontiers 2023, CF 2023
PB - Association for Computing Machinery, Inc
Y2 - 9 May 2023 through 11 May 2023
ER -