Model-based Reinforcement Learning with Provable Safety Guarantees via Control Barrier Functions

Hongchao Zhang, Zhouchi Li, Andrew Clark

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Safety is a critical property in applications including robotics, transportation, and energy. Safety is especially challenging in reinforcement learning (RL) settings, in which uncertainty of the system dynamics may cause safety violations during exploration. Control Barrier Functions (CBFs), which enforce safety by constraining the control actions at each time step, are a promising approach for safety-critical control. This technique has been applied to ensure the safety of model-free RL, however, it has not been integrated into model-based RL. In this paper, we propose Uncertainty-Tolerant Control Barrier Functions (UTCBFs), a new class of CBFs to incorporate model uncertainty and provide provable safety guarantees with desired probability. Furthermore, we introduce an algorithm for model-based RL to guarantee safety by integrating CBFs with gradient-based policy search. Our approach is verified through a numerical study of a cart-pole system and an inverted pendulum system with comparison to state-of-the-art RL algorithms.

Original languageEnglish
Title of host publication2021 IEEE International Conference on Robotics and Automation, ICRA 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages792-798
Number of pages7
ISBN (Electronic)9781728190778
DOIs
StatePublished - 2021
Event2021 IEEE International Conference on Robotics and Automation, ICRA 2021 - Xi'an, China
Duration: May 30 2021Jun 5 2021

Publication series

NameProceedings - IEEE International Conference on Robotics and Automation
Volume2021-May
ISSN (Print)1050-4729

Conference

Conference2021 IEEE International Conference on Robotics and Automation, ICRA 2021
Country/TerritoryChina
CityXi'an
Period05/30/2106/5/21

Fingerprint

Dive into the research topics of 'Model-based Reinforcement Learning with Provable Safety Guarantees via Control Barrier Functions'. Together they form a unique fingerprint.

Cite this