An optimal online method of selecting source policies for reinforcement learning

  • Siyuan Li
  • , Chongjie Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

44 Scopus citations

Abstract

Transfer learning significantly accelerates the reinforcement learning process by exploiting relevant knowledge from previous experiences. The problem of optimally selecting source policies during the learning process is of great importance yet challenging. There has been little theoretical analysis of this problem. In this paper, we develop an optimal online method to select source policies for reinforcement learning. This method formulates online source policy selection as a multi-armed bandit problem and augments Q-learning with policy reuse. We provide theoretical guarantees of the optimal selection process and convergence to the optimal policy. In addition, we conduct experiments on a grid-based robot navigation domain to demonstrate its efficiency and robustness by comparing to the state-of-the-art transfer learning method.

Original languageEnglish
Title of host publication32nd AAAI Conference on Artificial Intelligence, AAAI 2018
PublisherAAAI press
Pages3562-3570
Number of pages9
ISBN (Electronic)9781577358008
StatePublished - 2018
Event32nd AAAI Conference on Artificial Intelligence, AAAI 2018 - New Orleans, United States
Duration: Feb 2 2018Feb 7 2018

Publication series

Name32nd AAAI Conference on Artificial Intelligence, AAAI 2018

Conference

Conference32nd AAAI Conference on Artificial Intelligence, AAAI 2018
Country/TerritoryUnited States
CityNew Orleans
Period02/2/1802/7/18

Fingerprint

Dive into the research topics of 'An optimal online method of selecting source policies for reinforcement learning'. Together they form a unique fingerprint.

Cite this