TY - GEN
T1 - Location-based memory fences
AU - Ladan-Mozes, Edya
AU - Lee, I. Ting Angelina
AU - Vyukov, Dmitry
PY - 2011
Y1 - 2011
N2 - Traditional memory fences are program-counter (PC) based. That is, a memory fence enforces a serialization point in the program instruction stream - - it ensures that all memory references before the fence in the program order have taken effect before the execution continues onto instructions after the fence. Such PC-based memory fences always cause the processor to stall, even when the synchronization is unnecessary during a particular execution. We propose the concept of location-based memory fences, which aim to reduce the cost of synchronization due to the latency of memory fence execution in parallel algorithms. Unlike a PC-based memory fence, a location-based memory fence serializes the instruction stream of the executing thread T1 only when a different thread T2 attempts to read the memory location which is guarded by the location-based memory fence. In this work, we describe a hardware mechanism for location-based memory fences, prove its correctness, and evaluate its potential performance benefit. Our experimental results are based on a software simulation of the proposed location-based memory fence, and thus expected to incur higher overhead than the proposed hardware mechanism would. Nevertheless, our software experiments show that applications can benefit from using location-based memory fences, but they do not scale as well in some cases, due to the software overhead. These results suggest that a hardware support for location-based memory fences is worth considering.
AB - Traditional memory fences are program-counter (PC) based. That is, a memory fence enforces a serialization point in the program instruction stream - - it ensures that all memory references before the fence in the program order have taken effect before the execution continues onto instructions after the fence. Such PC-based memory fences always cause the processor to stall, even when the synchronization is unnecessary during a particular execution. We propose the concept of location-based memory fences, which aim to reduce the cost of synchronization due to the latency of memory fence execution in parallel algorithms. Unlike a PC-based memory fence, a location-based memory fence serializes the instruction stream of the executing thread T1 only when a different thread T2 attempts to read the memory location which is guarded by the location-based memory fence. In this work, we describe a hardware mechanism for location-based memory fences, prove its correctness, and evaluate its potential performance benefit. Our experimental results are based on a software simulation of the proposed location-based memory fence, and thus expected to incur higher overhead than the proposed hardware mechanism would. Nevertheless, our software experiments show that applications can benefit from using location-based memory fences, but they do not scale as well in some cases, due to the software overhead. These results suggest that a hardware support for location-based memory fences is worth considering.
KW - asymmetric synchronization
KW - biased locks
KW - location-based memory fences
KW - memory fences
KW - the dekker duality
KW - the dekker protocol
UR - https://www.scopus.com/pages/publications/79959667047
U2 - 10.1145/1989493.1989503
DO - 10.1145/1989493.1989503
M3 - Conference contribution
AN - SCOPUS:79959667047
SN - 9781450307437
T3 - Annual ACM Symposium on Parallelism in Algorithms and Architectures
SP - 75
EP - 84
BT - SPAA'11 - Proceedings of the 23rd Annual Symposium on Parallelism in Algorithms and Architectures
T2 - 23rd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA'11
Y2 - 4 June 2011 through 6 June 2011
ER -