How to Manage High-Bandwidth Memory Automatically

  • Rathish Das
  • , Kunal Agrawal
  • , Michael A. Bender
  • , Jonathan Berry
  • , Benjamin Moseley
  • , Cynthia A. Phillips

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

This paper develops an algorithmic foundation for automated management of the multilevel-memory systems common to new supercomputers. In particular, the High-Bandwidth Memory (HBM) of these systems has a similar latency to that of DRAM and a smaller capacity, but it has much larger bandwidth. Systems equipped with HBM do not fit in classic memory-hierarchy models due to HBM's atypical characteristics. Unlike caches, which are generally managed automatically by the hardware, programmers of some current HBM-equipped supercomputers can choose to explicitly manage HBM themselves. This process is problem specific and resource intensive. Vendors offer this option because there is no consensus on how to automatically manage HBM to guarantee good performance, or whether this is even possible. In this paper, we give theoretical support for automatic HBM management by developing simple algorithms that can automatically control HBM and deliver good performance on multicore systems. HBM management is starkly different from traditional caching both in terms of optimization objectives and algorithm development. Since DRAM and HBM have similar latencies, minimizing HBM misses (provably) turns out not to be the right memory-management objective. Instead, we directly focus on minimizing makespan. In addition, while cache-management algorithms must focus on what pages to keep in cache; HBM management requires answering two questions: (1) which pages to keep in HBM and (2) how to use the limited bandwidth from HBM to DRAM. It turns out that the natural approach of using LRU for the first question and FCFS (First-Come-First-Serve) for the second question is provably bad. Instead, we provide a priority based approach that is simple, efficiently implementable and $O(1)$-competitive for makespan when all multicore threads are independent.

Original languageEnglish
Title of host publicationSPAA 2020 - Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures
PublisherAssociation for Computing Machinery
Pages187-199
Number of pages13
ISBN (Electronic)9781450369350
DOIs
StatePublished - Jul 6 2020
Event32nd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2020 - Virtual, Online, United States
Duration: Jul 15 2020Jul 17 2020

Publication series

NameAnnual ACM Symposium on Parallelism in Algorithms and Architectures

Conference

Conference32nd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2020
Country/TerritoryUnited States
CityVirtual, Online
Period07/15/2007/17/20

Keywords

  • approximation algorithms
  • high-bandwidth memory
  • multicore paging
  • online algorithms
  • paging
  • scheduling

Fingerprint

Dive into the research topics of 'How to Manage High-Bandwidth Memory Automatically'. Together they form a unique fingerprint.

Cite this