LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner

LaMMA-P:
Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner

Xiaopan Zhang*
Hao Qin*
Fuquan Wang
Yue Dong

Jiachen Li^‡

*Equal contribution, ‡Corresponding author

Visualizations of generated plans for tasks of varying difficulty levels across diverse scenarios:

Compound task: turn off the lights, place the phone on the bed, and leave the book open.

Complex task: place the watch and the key ring into the drawer, and then turn off the TV.

Complex task: microwave a plate containing an egg and a tomato.

Vague command task: prepare the shower and throw away any trash.

Abstract

Language models (LMs) possess a strong capability to comprehend natural language, making them effective in translating human instructions into detailed plans for simple robot tasks. Nevertheless, it remains a significant challenge to handle long-horizon tasks, especially in subtask identification and allocation for cooperative heterogeneous robot teams. To address this issue, we propose a Language Model-Driven Multi-Agent PDDL Planner (LaMMA-P), a novel multi-agent task planning framework that achieves state-of-the-art performance on long-horizon tasks. LaMMA-P integrates the strengths of the LMs’ reasoning capability and the traditional heuristic search planner to achieve a high success rate and efficiency while demonstrating strong generalization across tasks. Additionally, we create MAT-THOR, a comprehensive benchmark that features household tasks with two different levels of complexity based on the AI2-THOR environment. The experimental results demonstrate that LaMMA-P achieves a 105% higher success rate and 36% higher efficiency than existing LM-based multi-agent planners.

Key Ideas and Contributions

1) Framework with PDDL and LLMs: We introduce a novel framework that integrates the reasoning ability of large language models (LLMs) with the heuristic planning algorithms of PDDL planners to address long-horizon task planning for heterogeneous robot teams.
2) Modular Design: We develop a modular design that allows seamless integration of LLMs, PDDL planning systems, and simulation environments, which enables flexible task decomposition and the efficient allocation of sub-tasks based on the skills and capabilities of each robot.
3) Novel Dataset and Performance Boost: We create MAT-THOR, a benchmark of multi-agent complex long-horizon tasks based on the AI2-THOR simulator, which evaluates the effectiveness and robustness of multi-agent planning methods by providing a standardized set of tasks and performance metrics for long-horizon task execution. Our method achieves state-of-the-art (SOTA) performance on this benchmark in terms of success rate and efficiency.

Evaluation of LaMMA-P and baselines on different categories of tasks in the MAT-THOR dataset

Quantitative Analysis: In this paper, we validated LaMMA-P in three categories of tasks: Compound Tasks, Complex Tasks, and Vague Command Tasks. The experimental results demonstrate that LaMMA-P achieves SOTA performance in both success rate and efficiency compared to the strongest baseline SMART-LLM. Please refer to the paper for more details.

LaMMA-P's Prompt Templates for Each Module

Precendition Identifier Task Allocator Problem Generator PDDL Validator Sub-Plan Combiner

Citation

@inproceedings{zhang2025lamma,
    title={LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner},
    author={Zhang, Xiaopan and Qin, Hao and Wang, Fuquan and Dong, Yue and Li, Jiachen},
    journal={2025 IEEE International Conference on Robotics and Automation (ICRA)},
    year={2025},
    organization={IEEE}
  }