:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zheng, Boyuan, Fatemi, Michael Y., Jin, Xiaolong, Wang, Zora Zhiruo, Gandhi, Apurva, Song, Yueqi, Gu, Yu, Srinivasa, Jayanth, Liu, Gaowen, Neubig, Graham, Su, Yu
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.07079
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Inducing Programmatic Skills for Agentic Tasks
by: Wang, Zora Zhiruo, et al.
Published: (2025)

Go-Browse: Training Web Agents with Structured Exploration
by: Gandhi, Apurva, et al.
Published: (2025)

Agent Workflow Memory
by: Wang, Zora Zhiruo, et al.
Published: (2024)

Training Versatile Coding Agents in Synthetic Environments
by: Zhu, Yiqi, et al.
Published: (2025)

Benchmarking Failures in Tool-Augmented Language Models
by: Treviño, Eduardo, et al.
Published: (2025)

CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation
by: Huq, Faria, et al.
Published: (2025)

How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations
by: Wang, Zora Zhiruo, et al.
Published: (2025)

Modeling Distinct Human Interaction in Web Agents
by: Huq, Faria, et al.
Published: (2026)

Attention Reveals More Than Tokens: Training-Free Long-Context Reasoning with Attention-guided Retrieval
by: Zhang, Yuwei, et al.
Published: (2025)

An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance
by: Khanuja, Simran, et al.
Published: (2024)

Diverse Score Distillation
by: Xu, Yanbo, et al.
Published: (2024)

Beyond Browsing: API-Based Web Agents
by: Song, Yueqi, et al.
Published: (2024)

CodeRAG-Bench: Can Retrieval Augment Code Generation?
by: Wang, Zora Zhiruo, et al.
Published: (2024)

Recursive Agent Optimization
by: Gandhi, Apurva, et al.
Published: (2026)

RAGGED: Towards Informed Design of Scalable and Stable RAG Systems
by: Hsia, Jennifer, et al.
Published: (2024)

What Is Missing in Multilingual Visual Reasoning and How to Fix It
by: Song, Yueqi, et al.
Published: (2024)

ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?
by: Waghjale, Siddhant, et al.
Published: (2024)

Open-world Multi-label Text Classification with Extremely Weak Supervision
by: Li, Xintong, et al.
Published: (2024)

What Are Tools Anyway? A Survey from the Language Model Perspective
by: Wang, Zhiruo, et al.
Published: (2024)

How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench
by: Mishra, Venkatesh, et al.
Published: (2025)

Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection
by: Zhang, Yuwei, et al.
Published: (2025)

FAMA: Failure-Aware Meta-Agentic Framework for Open-Source LLMs in Interactive Tool Use Environments
by: Saeidi, Amir, et al.
Published: (2026)

Answer is All You Need: Instruction-following Text Embedding via Answering the Question
by: Peng, Letian, et al.
Published: (2024)

TIER: Trajectory-Invariant Execution Rewards for Multi-Step Tool Composition
by: Kulkarni, Anay, et al.
Published: (2026)

A Retrieve-and-Read Framework for Knowledge Graph Link Prediction
by: Pahuja, Vardaan, et al.
Published: (2022)

AutoPresent: Designing Structured Visuals from Scratch
by: Ge, Jiaxin, et al.
Published: (2025)

Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning
by: Mishra, Venkatesh, et al.
Published: (2025)

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
by: Zhou, Kaiwen, et al.
Published: (2025)

ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support Conversations
by: Zhu, Jie, et al.
Published: (2026)

Hone Your Job Search Skills
Published: (2024)

Grounding Multilingual Multimodal LLMs With Cultural Knowledge
by: Nyandwi, Jean de Dieu, et al.
Published: (2025)

ToolMem: Enhancing Multimodal Agents with Learnable Tool Capability Memory
by: Xiao, Yunzhong, et al.
Published: (2025)

Epistemic Skills: Logical Dynamics of Knowing and Forgetting
by: Liang, Xiaolong, et al.
Published: (2024)

TOM-SWE: User Mental Modeling For Software Engineering Agents
by: Zhou, Xuhui, et al.
Published: (2025)

Epistemic Skills: Reasoning about Knowledge and Oblivion
by: Liang, Xiaolong, et al.
Published: (2025)

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1
by: Zhou, Kaiwen, et al.
Published: (2025)

Better Synthetic Data by Retrieving and Transforming Existing Datasets
by: Gandhi, Saumya, et al.
Published: (2024)

Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments
by: Gu, Yu, et al.
Published: (2024)

Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension
by: Yin, Fan, et al.
Published: (2024)

Offloading Score: Measuring AI Reliance Through Counterfactual Workflows
by: Padmakumar, Vishakh, et al.
Published: (2026)