Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wu, Yunnan, Chen, Paul, Baranwal, Deshank, Zhou, Jinlong, Yuan, Jian
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2503.21036
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909554400296960
author	Wu, Yunnan Chen, Paul Baranwal, Deshank Zhou, Jinlong Yuan, Jian
author_facet	Wu, Yunnan Chen, Paul Baranwal, Deshank Zhou, Jinlong Yuan, Jian
contents	We present an agentic framework, Thinker, which achieves state of art performance in challenging reasoning tasks for realistic customer service scenarios that involve complex business logic and human interactions via long horizons. On the $τ$-bench retail dataset, Thinker achieves 82.6\% success rate with GPT-4o (version 2024-06-01) (baseline: 68.3\%), and 81.9\% success rate with Llama-3.1 405B (baseline: 49.6\%), without any fine-tuning. Thinker effectively closes the gap in reasoning capabilities between the base models by introducing proper structure. The key features of the Thinker framework are: (1) State-Machine Augmented Generation (SMAG), which represents business logic as state machines and the LLM uses state machines as tools. (2) Delegation of tasks from the main reasoning loop to LLM-powered tools. (3) Adaptive context management. Our prompting-only solution achieves signficant gains, while still maintaining a standard agentic architecture with a ReAct style reasoning loop. The key is to innovate on the tool interface design, as exemplified by SMAG and the LLM-powered tools.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_21036
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	The Art of Tool Interface Design Wu, Yunnan Chen, Paul Baranwal, Deshank Zhou, Jinlong Yuan, Jian Artificial Intelligence We present an agentic framework, Thinker, which achieves state of art performance in challenging reasoning tasks for realistic customer service scenarios that involve complex business logic and human interactions via long horizons. On the $τ$-bench retail dataset, Thinker achieves 82.6\% success rate with GPT-4o (version 2024-06-01) (baseline: 68.3\%), and 81.9\% success rate with Llama-3.1 405B (baseline: 49.6\%), without any fine-tuning. Thinker effectively closes the gap in reasoning capabilities between the base models by introducing proper structure. The key features of the Thinker framework are: (1) State-Machine Augmented Generation (SMAG), which represents business logic as state machines and the LLM uses state machines as tools. (2) Delegation of tasks from the main reasoning loop to LLM-powered tools. (3) Adaptive context management. Our prompting-only solution achieves signficant gains, while still maintaining a standard agentic architecture with a ReAct style reasoning loop. The key is to innovate on the tool interface design, as exemplified by SMAG and the LLM-powered tools.
title	The Art of Tool Interface Design
topic	Artificial Intelligence
url	https://arxiv.org/abs/2503.21036

Similar Items