TY - GEN
T1 - Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs
AU - Nguyen, Due Thien
AU - Yeoh, William
AU - Lau, Hoong Chuin
AU - Zilberstein, Shlomo
AU - Zhang, Chongjie
N1 - Publisher Copyright:
Copyright © 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2014
Y1 - 2014
N2 - Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (it) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-leaming algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (Hi) We empirically evaluate them against an existing multi- Arm bandit DCOP algorithm on dynamic DCOPs.
AB - Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (it) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-leaming algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (Hi) We empirically evaluate them against an existing multi- Arm bandit DCOP algorithm on dynamic DCOPs.
UR - https://www.scopus.com/pages/publications/84908212814
M3 - Conference contribution
AN - SCOPUS:84908212814
T3 - Proceedings of the National Conference on Artificial Intelligence
SP - 1447
EP - 1455
BT - Proceedings of the National Conference on Artificial Intelligence
PB - AI Access Foundation
T2 - 28th AAAI Conference on Artificial Intelligence, AAAI 2014, 26th Innovative Applications of Artificial Intelligence Conference, IAAI 2014 and the 5th Symposium on Educational Advances in Artificial Intelligence, EAAI 2014
Y2 - 27 July 2014 through 31 July 2014
ER -