Polynomial Time Reinforcement Learning in Factored State MDPs with Linear Value Functions

Zihao Deng, Siddartha Devic, Brendan Juba

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

Many reinforcement learning (RL) environments in practice feature enormous state spaces that may be described compactly by a “factored” structure, that may be modeled by Factored Markov Decision Processes (FMDPs). We present the first polynomial-time algorithm for RL in Factored State MDPs (generalizing FMDPs) that neither relies on an oracle planner nor requires a linear transition model; it only requires a linear value function with a suitable local basis with respect to the factorization, permitting efficient variable elimination. With this assumption, we can solve this family of Factored State MDPs in polynomial time by constructing an efficient separation oracle for convex optimization. Importantly, and in contrast to prior work on FMDPs, we do not assume that the transitions on various factors are conditionally independent.

Original languageEnglish
Pages (from-to)11280-11304
Number of pages25
JournalProceedings of Machine Learning Research
Volume151
StatePublished - 2022
Event25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022 - Virtual, Online, Spain
Duration: Mar 28 2022Mar 30 2022

Fingerprint

Dive into the research topics of 'Polynomial Time Reinforcement Learning in Factored State MDPs with Linear Value Functions'. Together they form a unique fingerprint.

Cite this