Saved in:
| Main Authors: | Zheng, Boyuan, Fatemi, Michael Y., Jin, Xiaolong, Wang, Zora Zhiruo, Gandhi, Apurva, Song, Yueqi, Gu, Yu, Srinivasa, Jayanth, Liu, Gaowen, Neubig, Graham, Su, Yu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.07079 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Inducing Programmatic Skills for Agentic Tasks
by: Wang, Zora Zhiruo, et al.
Published: (2025)
by: Wang, Zora Zhiruo, et al.
Published: (2025)
Go-Browse: Training Web Agents with Structured Exploration
by: Gandhi, Apurva, et al.
Published: (2025)
by: Gandhi, Apurva, et al.
Published: (2025)
Agent Workflow Memory
by: Wang, Zora Zhiruo, et al.
Published: (2024)
by: Wang, Zora Zhiruo, et al.
Published: (2024)
Training Versatile Coding Agents in Synthetic Environments
by: Zhu, Yiqi, et al.
Published: (2025)
by: Zhu, Yiqi, et al.
Published: (2025)
Benchmarking Failures in Tool-Augmented Language Models
by: Treviño, Eduardo, et al.
Published: (2025)
by: Treviño, Eduardo, et al.
Published: (2025)
CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation
by: Huq, Faria, et al.
Published: (2025)
by: Huq, Faria, et al.
Published: (2025)
How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations
by: Wang, Zora Zhiruo, et al.
Published: (2025)
by: Wang, Zora Zhiruo, et al.
Published: (2025)
Modeling Distinct Human Interaction in Web Agents
by: Huq, Faria, et al.
Published: (2026)
by: Huq, Faria, et al.
Published: (2026)
Attention Reveals More Than Tokens: Training-Free Long-Context Reasoning with Attention-guided Retrieval
by: Zhang, Yuwei, et al.
Published: (2025)
by: Zhang, Yuwei, et al.
Published: (2025)
An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance
by: Khanuja, Simran, et al.
Published: (2024)
by: Khanuja, Simran, et al.
Published: (2024)
Diverse Score Distillation
by: Xu, Yanbo, et al.
Published: (2024)
by: Xu, Yanbo, et al.
Published: (2024)
Beyond Browsing: API-Based Web Agents
by: Song, Yueqi, et al.
Published: (2024)
by: Song, Yueqi, et al.
Published: (2024)
CodeRAG-Bench: Can Retrieval Augment Code Generation?
by: Wang, Zora Zhiruo, et al.
Published: (2024)
by: Wang, Zora Zhiruo, et al.
Published: (2024)
Recursive Agent Optimization
by: Gandhi, Apurva, et al.
Published: (2026)
by: Gandhi, Apurva, et al.
Published: (2026)
RAGGED: Towards Informed Design of Scalable and Stable RAG Systems
by: Hsia, Jennifer, et al.
Published: (2024)
by: Hsia, Jennifer, et al.
Published: (2024)
What Is Missing in Multilingual Visual Reasoning and How to Fix It
by: Song, Yueqi, et al.
Published: (2024)
by: Song, Yueqi, et al.
Published: (2024)
ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?
by: Waghjale, Siddhant, et al.
Published: (2024)
by: Waghjale, Siddhant, et al.
Published: (2024)
Open-world Multi-label Text Classification with Extremely Weak Supervision
by: Li, Xintong, et al.
Published: (2024)
by: Li, Xintong, et al.
Published: (2024)
What Are Tools Anyway? A Survey from the Language Model Perspective
by: Wang, Zhiruo, et al.
Published: (2024)
by: Wang, Zhiruo, et al.
Published: (2024)
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench
by: Mishra, Venkatesh, et al.
Published: (2025)
by: Mishra, Venkatesh, et al.
Published: (2025)
Bidirectional LMs are Better Knowledge Memorizers? A Benchmark for Real-world Knowledge Injection
by: Zhang, Yuwei, et al.
Published: (2025)
by: Zhang, Yuwei, et al.
Published: (2025)
FAMA: Failure-Aware Meta-Agentic Framework for Open-Source LLMs in Interactive Tool Use Environments
by: Saeidi, Amir, et al.
Published: (2026)
by: Saeidi, Amir, et al.
Published: (2026)
Answer is All You Need: Instruction-following Text Embedding via Answering the Question
by: Peng, Letian, et al.
Published: (2024)
by: Peng, Letian, et al.
Published: (2024)
TIER: Trajectory-Invariant Execution Rewards for Multi-Step Tool Composition
by: Kulkarni, Anay, et al.
Published: (2026)
by: Kulkarni, Anay, et al.
Published: (2026)
A Retrieve-and-Read Framework for Knowledge Graph Link Prediction
by: Pahuja, Vardaan, et al.
Published: (2022)
by: Pahuja, Vardaan, et al.
Published: (2022)
AutoPresent: Designing Structured Visuals from Scratch
by: Ge, Jiaxin, et al.
Published: (2025)
by: Ge, Jiaxin, et al.
Published: (2025)
Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning
by: Mishra, Venkatesh, et al.
Published: (2025)
by: Mishra, Venkatesh, et al.
Published: (2025)
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
by: Zhou, Kaiwen, et al.
Published: (2025)
by: Zhou, Kaiwen, et al.
Published: (2025)
ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support Conversations
by: Zhu, Jie, et al.
Published: (2026)
by: Zhu, Jie, et al.
Published: (2026)
Hone Your Job Search Skills
Published: (2024)
Published: (2024)
Grounding Multilingual Multimodal LLMs With Cultural Knowledge
by: Nyandwi, Jean de Dieu, et al.
Published: (2025)
by: Nyandwi, Jean de Dieu, et al.
Published: (2025)
ToolMem: Enhancing Multimodal Agents with Learnable Tool Capability Memory
by: Xiao, Yunzhong, et al.
Published: (2025)
by: Xiao, Yunzhong, et al.
Published: (2025)
Epistemic Skills: Logical Dynamics of Knowing and Forgetting
by: Liang, Xiaolong, et al.
Published: (2024)
by: Liang, Xiaolong, et al.
Published: (2024)
TOM-SWE: User Mental Modeling For Software Engineering Agents
by: Zhou, Xuhui, et al.
Published: (2025)
by: Zhou, Xuhui, et al.
Published: (2025)
Epistemic Skills: Reasoning about Knowledge and Oblivion
by: Liang, Xiaolong, et al.
Published: (2025)
by: Liang, Xiaolong, et al.
Published: (2025)
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1
by: Zhou, Kaiwen, et al.
Published: (2025)
by: Zhou, Kaiwen, et al.
Published: (2025)
Better Synthetic Data by Retrieving and Transforming Existing Datasets
by: Gandhi, Saumya, et al.
Published: (2024)
by: Gandhi, Saumya, et al.
Published: (2024)
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments
by: Gu, Yu, et al.
Published: (2024)
by: Gu, Yu, et al.
Published: (2024)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension
by: Yin, Fan, et al.
Published: (2024)
by: Yin, Fan, et al.
Published: (2024)
Offloading Score: Measuring AI Reliance Through Counterfactual Workflows
by: Padmakumar, Vishakh, et al.
Published: (2026)
by: Padmakumar, Vishakh, et al.
Published: (2026)
Similar Items
-
Inducing Programmatic Skills for Agentic Tasks
by: Wang, Zora Zhiruo, et al.
Published: (2025) -
Go-Browse: Training Web Agents with Structured Exploration
by: Gandhi, Apurva, et al.
Published: (2025) -
Agent Workflow Memory
by: Wang, Zora Zhiruo, et al.
Published: (2024) -
Training Versatile Coding Agents in Synthetic Environments
by: Zhu, Yiqi, et al.
Published: (2025) -
Benchmarking Failures in Tool-Augmented Language Models
by: Treviño, Eduardo, et al.
Published: (2025)