When should you continue with your ongoing plans and when should you instead decide to pursue better opportunities? We show in theory and experiment that such stay-or-leave decisions are consistent with deep R-learning both behaviorally and neuronally. Our results suggest that real-world agents leave depleting resources when their reward rate falls below its exponential average, which, we argue, is a Bayes optimal rule in dynamic natural environments. Our work links reinforcement learning, the marginal value theorem and Bayesian inference approaches to offer a learning algorithm and a decision rule for making sequential stay-or-leave choices.
|Journal||Advances in Neural Information Processing Systems|
|State||Published - 2020|
|Event||34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online|
Duration: Dec 6 2020 → Dec 12 2020