Enhancing Decision-Making of Large Language Models via Actor-Critic

  • Heng Dong
  • , Kefei Duan
  • , Chongjie Zhang

Research output: Contribution to journalConference articlepeer-review

Abstract

Large Language Models (LLMs) have achieved remarkable advancements in natural language processing tasks, yet they encounter challenges in complex decision-making scenarios that require long-term reasoning and alignment with high-level objectives. Existing methods either rely on short-term auto-regressive action generation or face limitations in accurately simulating rollouts and assessing outcomes, leading to suboptimal decisions. This paper introduces a novel LLM-based Actor-Critic framework, termed LAC, that effectively improves LLM policies with longterm action evaluations in a principled and scalable way. Our approach addresses two key challenges: (1) extracting robust action evaluations by computing Q-values via token logits associated with positive/negative outcomes, enhanced by future trajectory rollouts and reasoning; and (2) enabling efficient policy improvement through a gradient-free mechanism. Experiments across diverse environments – including high-level decision-making (ALFWorld), lowlevel action spaces (BabyAI-Text), and large action spaces (WebShop) – demonstrate the framework’s generality and superiority over state-ofthe-art methods. Notably, our approach achieves competitive performance using 7B/8B parameter LLMs, even outperforming baseline methods employing GPT-4 in complex tasks. These results underscore the potential of integrating structured policy optimization with LLMs’ intrinsic knowledge to advance decision-making capabilities in multi-step environments.

Original languageEnglish
Pages (from-to)13984-14020
Number of pages37
JournalProceedings of Machine Learning Research
Volume267
StatePublished - 2025
Event42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada
Duration: Jul 13 2025Jul 19 2025

Fingerprint

Dive into the research topics of 'Enhancing Decision-Making of Large Language Models via Actor-Critic'. Together they form a unique fingerprint.

Cite this