Saved in:
Bibliographic Details
Main Authors: Qin, Tian, Bai, Felix, Hu, Ting-Yao, Vemulapalli, Raviteja, Koppula, Hema Swetha, Xu, Zhiyang, Jin, Bowen, Cemri, Mert, Lu, Jiarui, Wang, Zirui, Cao, Meng
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.07043
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918451017154560
author Qin, Tian
Bai, Felix
Hu, Ting-Yao
Vemulapalli, Raviteja
Koppula, Hema Swetha
Xu, Zhiyang
Jin, Bowen
Cemri, Mert
Lu, Jiarui
Wang, Zirui
Cao, Meng
author_facet Qin, Tian
Bai, Felix
Hu, Ting-Yao
Vemulapalli, Raviteja
Koppula, Hema Swetha
Xu, Zhiyang
Jin, Bowen
Cemri, Mert
Lu, Jiarui
Wang, Zirui
Cao, Meng
contents Human decision-making often involves constrained optimization. As LLM agents are deployed to assist with real-world tasks like travel planning, shopping, and scheduling, they must mirror this capability. We introduce COMPASS, a benchmark that evaluates whether LLM agents can perform constrained optimization in realistic travel planning settings. To success in these tasks, agents must engage in multi-turn conversations with user to gather task information as well as use tools to gather information from the database. Then agents must propose a solution that not only satisfies hard constraints but also optimizes user's utility objective. Evaluating state-of-the-art models, we reveal a significant feasible-optimal gap: while models achieve 70-90% feasibility (constraint satisfaction), they reach only 20-60% optimality (utility optimization). Our analysis shows that tool use is not the bottleneck. Instead, the core limitation is insufficient exploration of the search space, with success strongly correlating with information gathered. Coding agents show a promising approach to mitigate this gap. Together, COMPASS provides a testbed for developing LLM agents that can truly mirror human decision-making by both satisfying constraints and optimizing objectives.
format Preprint
id arxiv_https___arxiv_org_abs_2510_07043
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle COMPASS: Benchmarking Constrained Optimization in LLM Agents
Qin, Tian
Bai, Felix
Hu, Ting-Yao
Vemulapalli, Raviteja
Koppula, Hema Swetha
Xu, Zhiyang
Jin, Bowen
Cemri, Mert
Lu, Jiarui
Wang, Zirui
Cao, Meng
Machine Learning
Human decision-making often involves constrained optimization. As LLM agents are deployed to assist with real-world tasks like travel planning, shopping, and scheduling, they must mirror this capability. We introduce COMPASS, a benchmark that evaluates whether LLM agents can perform constrained optimization in realistic travel planning settings. To success in these tasks, agents must engage in multi-turn conversations with user to gather task information as well as use tools to gather information from the database. Then agents must propose a solution that not only satisfies hard constraints but also optimizes user's utility objective. Evaluating state-of-the-art models, we reveal a significant feasible-optimal gap: while models achieve 70-90% feasibility (constraint satisfaction), they reach only 20-60% optimality (utility optimization). Our analysis shows that tool use is not the bottleneck. Instead, the core limitation is insufficient exploration of the search space, with success strongly correlating with information gathered. Coding agents show a promising approach to mitigate this gap. Together, COMPASS provides a testbed for developing LLM agents that can truly mirror human decision-making by both satisfying constraints and optimizing objectives.
title COMPASS: Benchmarking Constrained Optimization in LLM Agents
topic Machine Learning
url https://arxiv.org/abs/2510.07043