TY - JOUR
T1 - Precise learning curves and higher-order scaling limits for dot-product kernel regression
AU - Xiao, Lechao
AU - Hu, Hong
AU - Misiakiewicz, Theodor
AU - Lu, Yue M.
AU - Pennington, Jeffrey
N1 - Publisher Copyright:
© 2023 IOP Publishing Ltd and SISSA Medialab srl
PY - 2023/11/1
Y1 - 2023/11/1
N2 - As modern machine learning models continue to advance the computational frontier, it has become increasingly important to develop precise estimates for expected performance improvements under different model and data scaling regimes. Currently, theoretical understanding of the learning curves (LCs) that characterize how the prediction error depends on the number of samples is restricted to either large-sample asymptotics ( m → ∞ ) or, for certain simple data distributions, to the high-dimensional asymptotics in which the number of samples scales linearly with the dimension ( m ∝ d ). There is a wide gulf between these two regimes, including all higher-order scaling relations m ∝ d r , which are the subject of the present paper. We focus on the problem of kernel ridge regression for dot-product kernels and present precise formulas for the mean of the test error, bias and variance, for data drawn uniformly from the sphere with isotropic random labels in the rth-order asymptotic scaling regime m → ∞ with m / d r held constant. We observe a peak in the LC whenever m ≈ d r / r ! for any integer r, leading to multiple sample-wise descent and non-trivial behavior at multiple scales. We include a colab (available at: https://tinyurl.com/2nzym7ym) notebook that reproduces the essential results of the paper.
AB - As modern machine learning models continue to advance the computational frontier, it has become increasingly important to develop precise estimates for expected performance improvements under different model and data scaling regimes. Currently, theoretical understanding of the learning curves (LCs) that characterize how the prediction error depends on the number of samples is restricted to either large-sample asymptotics ( m → ∞ ) or, for certain simple data distributions, to the high-dimensional asymptotics in which the number of samples scales linearly with the dimension ( m ∝ d ). There is a wide gulf between these two regimes, including all higher-order scaling relations m ∝ d r , which are the subject of the present paper. We focus on the problem of kernel ridge regression for dot-product kernels and present precise formulas for the mean of the test error, bias and variance, for data drawn uniformly from the sphere with isotropic random labels in the rth-order asymptotic scaling regime m → ∞ with m / d r held constant. We observe a peak in the LC whenever m ≈ d r / r ! for any integer r, leading to multiple sample-wise descent and non-trivial behavior at multiple scales. We include a colab (available at: https://tinyurl.com/2nzym7ym) notebook that reproduces the essential results of the paper.
KW - machine learning
UR - https://www.scopus.com/pages/publications/85184020620
U2 - 10.1088/1742-5468/ad01b7
DO - 10.1088/1742-5468/ad01b7
M3 - Article
AN - SCOPUS:85184020620
SN - 1742-5468
VL - 2023
JO - Journal of Statistical Mechanics: Theory and Experiment
JF - Journal of Statistical Mechanics: Theory and Experiment
IS - 11
M1 - 114005
ER -