Integrated common sense learning and planning in POMDPs

Brendan Juba

Research output: Contribution to journalReview articlepeer-review

7 Scopus citations

Abstract

We formulate a new variant of the problem of planning in an unknown environment, for which we can provide algorithms with reasonable theoretical guarantees in spite of large state spaces and time horizons, partial observability, and complex dynamics. In this variant, an agent is given a collection of example traces produced by a reference policy, which may, for example, capture the agent's past behavior. The agent is (only) asked to find policies that are supported by regularities in the dynamics that are observable on these example traces. We describe an efficient algorithm that uses such "common sense" knowledge reflected in the example traces to construct decision tree policies for goal-oriented factored POMDPs. More precisely, our algorithm (provably) succeeds at finding a policy for a given input goal when (1) there is a CNF that is almost always observed satisfied on the traces of the POMDP, capturing a sufficient approximation of its dynamics and (2) for a decision tree policy of bounded complexity, there exist small-space resolution proofs that the goal is achieved on each branch using the aforementioned CNF capturing the "common sense rules." Such a CNF always exists for noisy STRIPS domains, for example. Our results thus essentially establish that the possession of a suitable exploration policy for collecting the necessary examples is the fundamental obstacle to learning to act in such environments.

Original languageEnglish
Pages (from-to)1-37
Number of pages37
JournalJournal of Machine Learning Research
Volume17
StatePublished - Jun 1 2016

Keywords

  • Decision tree policies
  • Noisy STRIPS
  • Non-monontonic reasoning
  • PAC-Semantics
  • Partially observed Markov decision process

Fingerprint

Dive into the research topics of 'Integrated common sense learning and planning in POMDPs'. Together they form a unique fingerprint.

Cite this