Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Qin, Tian, Bai, Felix, Hu, Ting-Yao, Vemulapalli, Raviteja, Koppula, Hema Swetha, Xu, Zhiyang, Jin, Bowen, Cemri, Mert, Lu, Jiarui, Wang, Zirui, Cao, Meng
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2510.07043
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918451017154560
author	Qin, Tian Bai, Felix Hu, Ting-Yao Vemulapalli, Raviteja Koppula, Hema Swetha Xu, Zhiyang Jin, Bowen Cemri, Mert Lu, Jiarui Wang, Zirui Cao, Meng
author_facet	Qin, Tian Bai, Felix Hu, Ting-Yao Vemulapalli, Raviteja Koppula, Hema Swetha Xu, Zhiyang Jin, Bowen Cemri, Mert Lu, Jiarui Wang, Zirui Cao, Meng
contents	Human decision-making often involves constrained optimization. As LLM agents are deployed to assist with real-world tasks like travel planning, shopping, and scheduling, they must mirror this capability. We introduce COMPASS, a benchmark that evaluates whether LLM agents can perform constrained optimization in realistic travel planning settings. To success in these tasks, agents must engage in multi-turn conversations with user to gather task information as well as use tools to gather information from the database. Then agents must propose a solution that not only satisfies hard constraints but also optimizes user's utility objective. Evaluating state-of-the-art models, we reveal a significant feasible-optimal gap: while models achieve 70-90% feasibility (constraint satisfaction), they reach only 20-60% optimality (utility optimization). Our analysis shows that tool use is not the bottleneck. Instead, the core limitation is insufficient exploration of the search space, with success strongly correlating with information gathered. Coding agents show a promising approach to mitigate this gap. Together, COMPASS provides a testbed for developing LLM agents that can truly mirror human decision-making by both satisfying constraints and optimizing objectives.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_07043
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	COMPASS: Benchmarking Constrained Optimization in LLM Agents Qin, Tian Bai, Felix Hu, Ting-Yao Vemulapalli, Raviteja Koppula, Hema Swetha Xu, Zhiyang Jin, Bowen Cemri, Mert Lu, Jiarui Wang, Zirui Cao, Meng Machine Learning Human decision-making often involves constrained optimization. As LLM agents are deployed to assist with real-world tasks like travel planning, shopping, and scheduling, they must mirror this capability. We introduce COMPASS, a benchmark that evaluates whether LLM agents can perform constrained optimization in realistic travel planning settings. To success in these tasks, agents must engage in multi-turn conversations with user to gather task information as well as use tools to gather information from the database. Then agents must propose a solution that not only satisfies hard constraints but also optimizes user's utility objective. Evaluating state-of-the-art models, we reveal a significant feasible-optimal gap: while models achieve 70-90% feasibility (constraint satisfaction), they reach only 20-60% optimality (utility optimization). Our analysis shows that tool use is not the bottleneck. Instead, the core limitation is insufficient exploration of the search space, with success strongly correlating with information gathered. Coding agents show a promising approach to mitigate this gap. Together, COMPASS provides a testbed for developing LLM agents that can truly mirror human decision-making by both satisfying constraints and optimizing objectives.
title	COMPASS: Benchmarking Constrained Optimization in LLM Agents
topic	Machine Learning
url	https://arxiv.org/abs/2510.07043

Similar Items