TY - JOUR
T1 - Implementation and analysis of GPU algorithms for Vecchia Approximation
AU - James, Zachary
AU - Guinness, Joseph
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
PY - 2024/12
Y1 - 2024/12
N2 - Gaussian Processes have become an indispensable part of the spatial statistician’s toolbox but are unsuitable for analyzing large datasets because of the significant time and memory needed to fit the associated model exactly. Vecchia Approximation is widely used to reduce the computational complexity and can be calculated with embarrassingly parallel algorithms. While multi-core software has been developed for Vecchia Approximation, software designed to run on graphics processing units (GPUs) is lacking, despite the tremendous success GPUs have had in statistics and machine learning. We compare three different ways to implement Vecchia Approximation on a GPU: two of which are similar to methods used for other Gaussian Process approximations and one that is new. Our new method exploits the properties of Vecchia Approximation to nearly eliminate thread synchronization and reduce memory access times. We show that our new method outperforms the other two and then compare it to existing multi-core and GPU-accelerated software by fitting Gaussian Process models on various datasets, including a large spatial-temporal dataset of n>106 points collected from an Earth-observing satellite. Our method works on larger datasets and provides higher predictive accuracy than existing GPU methods, and it runs up to 20 times faster than a single-core CPU implementation of Vecchia Approximation.
AB - Gaussian Processes have become an indispensable part of the spatial statistician’s toolbox but are unsuitable for analyzing large datasets because of the significant time and memory needed to fit the associated model exactly. Vecchia Approximation is widely used to reduce the computational complexity and can be calculated with embarrassingly parallel algorithms. While multi-core software has been developed for Vecchia Approximation, software designed to run on graphics processing units (GPUs) is lacking, despite the tremendous success GPUs have had in statistics and machine learning. We compare three different ways to implement Vecchia Approximation on a GPU: two of which are similar to methods used for other Gaussian Process approximations and one that is new. Our new method exploits the properties of Vecchia Approximation to nearly eliminate thread synchronization and reduce memory access times. We show that our new method outperforms the other two and then compare it to existing multi-core and GPU-accelerated software by fitting Gaussian Process models on various datasets, including a large spatial-temporal dataset of n>106 points collected from an Earth-observing satellite. Our method works on larger datasets and provides higher predictive accuracy than existing GPU methods, and it runs up to 20 times faster than a single-core CPU implementation of Vecchia Approximation.
KW - High-performance computing
KW - Parallel computing
KW - Spatial analysis
UR - https://www.scopus.com/pages/publications/85208138767
U2 - 10.1007/s11222-024-10510-9
DO - 10.1007/s11222-024-10510-9
M3 - Article
AN - SCOPUS:85208138767
SN - 0960-3174
VL - 34
JO - Statistics and Computing
JF - Statistics and Computing
IS - 6
M1 - 207
ER -