TY - GEN
T1 - Reward Delay Attacks on Deep Reinforcement Learning
AU - Sarkar, Anindya
AU - Feng, Jiarui
AU - Vorobeychik, Yevgeniy
AU - Gill, Christopher
AU - Zhang, Ning
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Most reinforcement learning algorithms implicitly assume strong synchrony. We present novel attacks targeting Q-learning that exploit a vulnerability entailed by this assumption by delaying the reward signal for a limited time period. We consider two types of attack goals: targeted attacks, which aim to cause a target policy to be learned, and untargeted attacks, which simply aim to induce a policy with a low reward. We evaluate the efficacy of the proposed attacks through a series of experiments. Our first observation is that reward-delay attacks are extremely effective when the goal is simply to minimize reward. Indeed, we find that even naive baseline reward-delay attacks are also highly successful in minimizing the reward. Targeted attacks, on the other hand, are more challenging, although we nevertheless demonstrate that the proposed approaches remain highly effective at achieving the attacker’s targets. In addition, we introduce a second threat model that captures a minimal mitigation that ensures that rewards cannot be used out of sequence. We find that this mitigation remains insufficient to ensure robustness to attacks that delay, but preserve the order, of rewards.
AB - Most reinforcement learning algorithms implicitly assume strong synchrony. We present novel attacks targeting Q-learning that exploit a vulnerability entailed by this assumption by delaying the reward signal for a limited time period. We consider two types of attack goals: targeted attacks, which aim to cause a target policy to be learned, and untargeted attacks, which simply aim to induce a policy with a low reward. We evaluate the efficacy of the proposed attacks through a series of experiments. Our first observation is that reward-delay attacks are extremely effective when the goal is simply to minimize reward. Indeed, we find that even naive baseline reward-delay attacks are also highly successful in minimizing the reward. Targeted attacks, on the other hand, are more challenging, although we nevertheless demonstrate that the proposed approaches remain highly effective at achieving the attacker’s targets. In addition, we introduce a second threat model that captures a minimal mitigation that ensures that rewards cannot be used out of sequence. We find that this mitigation remains insufficient to ensure robustness to attacks that delay, but preserve the order, of rewards.
KW - Adversarial attack
KW - Deep reinforcement learning
KW - Reward delay attack
UR - https://www.scopus.com/pages/publications/85151121179
U2 - 10.1007/978-3-031-26369-9_11
DO - 10.1007/978-3-031-26369-9_11
M3 - Conference contribution
AN - SCOPUS:85151121179
SN - 9783031263682
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 212
EP - 230
BT - Decision and Game Theory for Security - 13th International Conference, GameSec 2022, Proceedings
A2 - Fang, Fei
A2 - Xu, Haifeng
A2 - Hayel, Yezekael
PB - Springer Science and Business Media Deutschland GmbH
T2 - 13th International Conference on Decision and Game Theory for Security, GameSec 2022
Y2 - 26 October 2022 through 28 October 2022
ER -