TY - GEN
T1 - Risk-Aware Distributed Multi-Agent Reinforcement Learning
AU - Al Maruf, Abdullah
AU - Niu, Luyao
AU - Ramasubramanian, Bhaskar
AU - Clark, Andrew
AU - Poovendran, Radha
N1 - Publisher Copyright:
© 2024 AACC.
PY - 2024
Y1 - 2024
N2 - Autonomous cyber and cyber-physical systems need to perform decision-making, learning, and control in unknown environments. Such decision-making can be sensitive to multiple factors, including modeling errors, changes in costs, and impacts of events in the tails of probability distributions. Although multi-agent reinforcement learning (MARL) provides a framework for learning behaviors through repeated interactions with the environment by minimizing an average cost, it is not adequate to overcome the above challenges. In this paper, we develop a distributed MARL approach to solve decision-making problems in unknown environments by learning risk-aware actions. We use the conditional value-at-risk (CVaR) to define the cost function that is being minimized, and introduce a Bellman operator to characterize the value function associated to a given state-action pair. We prove that this operator satisfies a contraction property, and that it converges to the optimal value function. We then propose a distributed MARL algorithm called the CVaR QD-Learning algorithm, and establish that value functions of individual agents reach consensus. We identify several challenges that arise in the implementation of the CVaR QD-Learning algorithm, and present solutions to overcome these. We evaluate the CVaR QD-Learning algorithm through simulations, and demonstrate the effect of a risk parameter on value functions at consensus.
AB - Autonomous cyber and cyber-physical systems need to perform decision-making, learning, and control in unknown environments. Such decision-making can be sensitive to multiple factors, including modeling errors, changes in costs, and impacts of events in the tails of probability distributions. Although multi-agent reinforcement learning (MARL) provides a framework for learning behaviors through repeated interactions with the environment by minimizing an average cost, it is not adequate to overcome the above challenges. In this paper, we develop a distributed MARL approach to solve decision-making problems in unknown environments by learning risk-aware actions. We use the conditional value-at-risk (CVaR) to define the cost function that is being minimized, and introduce a Bellman operator to characterize the value function associated to a given state-action pair. We prove that this operator satisfies a contraction property, and that it converges to the optimal value function. We then propose a distributed MARL algorithm called the CVaR QD-Learning algorithm, and establish that value functions of individual agents reach consensus. We identify several challenges that arise in the implementation of the CVaR QD-Learning algorithm, and present solutions to overcome these. We evaluate the CVaR QD-Learning algorithm through simulations, and demonstrate the effect of a risk parameter on value functions at consensus.
UR - http://www.scopus.com/inward/record.url?scp=85204426427&partnerID=8YFLogxK
U2 - 10.23919/ACC60939.2024.10644829
DO - 10.23919/ACC60939.2024.10644829
M3 - Conference contribution
AN - SCOPUS:85204426427
T3 - Proceedings of the American Control Conference
SP - 4012
EP - 4019
BT - 2024 American Control Conference, ACC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 American Control Conference, ACC 2024
Y2 - 10 July 2024 through 12 July 2024
ER -