TY - GEN
T1 - Sheared Backpropagation for Fine-Tuning Foundation Models
AU - Yu, Zhiyuan
AU - Shen, Li
AU - Ding, Liang
AU - Tian, Xinmei
AU - Chen, Yixin
AU - Tao, Dacheng
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Fine-tuning is the process of extending the training of pre-trained models on specific target tasks, thereby significantly enhancing their performance across various applications. However, fine-tuning often demands large memory consumption, posing a challenge for low-memory devices that some previous memory-efficient fine-tuning methods attempted to mitigate by pruning activations for gradient computation, albeit at the cost of significant computational overhead from the pruning processes during training. To address these challenges, we introduce PreBackRazor; a novel activation pruning scheme offering both computational and memory efficiency through a sparsified back-propagation strategy, which judiciously avoids unnecessary activation pruning and storage and gradient computation. Before activation pruning, our approach samples a probability of selecting a portion of parameters to freeze, utilizing a bandit method for updates to prioritize impactful gradients on convergence. During the feed-forward pass, each model layer adjusts adaptively based on parameter activation status, obviating the need for sparsification and storage of redundant activations for subsequent backpropagation. Benchmarking on fine-tuning foundation models, our approach maintains baseline accuracy across diverse tasks, yielding over 20% speedup and around 10% memory reduction. Moreover, integrating with an advanced CUDA kernel achieves up to 60% speedup without extra memory costs or accuracy loss, significantly enhancing the efficiency of fine-tuning foundation models on memory-constrained devices.
AB - Fine-tuning is the process of extending the training of pre-trained models on specific target tasks, thereby significantly enhancing their performance across various applications. However, fine-tuning often demands large memory consumption, posing a challenge for low-memory devices that some previous memory-efficient fine-tuning methods attempted to mitigate by pruning activations for gradient computation, albeit at the cost of significant computational overhead from the pruning processes during training. To address these challenges, we introduce PreBackRazor; a novel activation pruning scheme offering both computational and memory efficiency through a sparsified back-propagation strategy, which judiciously avoids unnecessary activation pruning and storage and gradient computation. Before activation pruning, our approach samples a probability of selecting a portion of parameters to freeze, utilizing a bandit method for updates to prioritize impactful gradients on convergence. During the feed-forward pass, each model layer adjusts adaptively based on parameter activation status, obviating the need for sparsification and storage of redundant activations for subsequent backpropagation. Benchmarking on fine-tuning foundation models, our approach maintains baseline accuracy across diverse tasks, yielding over 20% speedup and around 10% memory reduction. Moreover, integrating with an advanced CUDA kernel achieves up to 60% speedup without extra memory costs or accuracy loss, significantly enhancing the efficiency of fine-tuning foundation models on memory-constrained devices.
KW - Activation Pruning
KW - Computation Efficiency
KW - Memory Efficiency
KW - Sparsity
UR - https://www.scopus.com/pages/publications/85207715643
U2 - 10.1109/CVPR52733.2024.00562
DO - 10.1109/CVPR52733.2024.00562
M3 - Conference contribution
AN - SCOPUS:85207715643
SN - 9798350353006
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 5883
EP - 5892
BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PB - IEEE Computer Society
T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Y2 - 16 June 2024 through 22 June 2024
ER -