Reinforcement Learning beyond Expectation

  • Bhaskar Ramasubramanian
  • , Luyao Niu
  • , Andrew Clark
  • , Radha Poovendran

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

The inputs and preferences of human users are important considerations in situations where these users interact with autonomous cyber or cyber-physical systems. In these scenarios, one is often interested in aligning behaviors of the system with the preferences of one or more human users. Cumulative prospect theory (CPT) is a paradigm that has been empirically shown to model a tendency of humans to view gains and losses differently. In this paper, we consider a setting where an autonomous agent has to learn behaviors in an unknown environment. In traditional reinforcement learning, these behaviors are learned through repeated interactions with the environment by optimizing an expected utility. In order to endow the agent with the ability to closely mimic the behavior of human users, we optimize a CPT-based cost. We introduce the notion of the CPT-value of an action taken in a state, and establish the convergence of an iterative dynamic programming-based approach to estimate this quantity. We develop two algorithms to enable agents to learn policies to optimize the CPT-value, and evaluate these algorithms in environments where a target state has to be reached while avoiding obstacles. We demonstrate that behaviors of the agent learned using these algorithms are better aligned with that of a human user who might be placed in the same environment, and is significantly improved over a baseline that optimizes an expected utility.

Original languageEnglish
Title of host publication60th IEEE Conference on Decision and Control, CDC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1528-1535
Number of pages8
ISBN (Electronic)9781665436595
DOIs
StatePublished - 2021
Event60th IEEE Conference on Decision and Control, CDC 2021 - Austin, United States
Duration: Dec 13 2021Dec 17 2021

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2021-December
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference60th IEEE Conference on Decision and Control, CDC 2021
Country/TerritoryUnited States
CityAustin
Period12/13/2112/17/21

Fingerprint

Dive into the research topics of 'Reinforcement Learning beyond Expectation'. Together they form a unique fingerprint.

Cite this