Saved in:
| Main Authors: | Zhang, Xuan, Jiang, Ziyan, Meng, Rui, Leng, Yifei, Xiao, Zhenbang, Wang, Zora Zhiruo, Shang, Yanyi, Kong, Dehan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.22056 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces
by: Fan, Sicheng, et al.
Published: (2026)
by: Fan, Sicheng, et al.
Published: (2026)
ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?
by: Waghjale, Siddhant, et al.
Published: (2024)
by: Waghjale, Siddhant, et al.
Published: (2024)
WebCanvas: Benchmarking Web Agents in Online Environments
by: Pan, Yichen, et al.
Published: (2024)
by: Pan, Yichen, et al.
Published: (2024)
WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents
by: Fan, Sicheng, et al.
Published: (2026)
by: Fan, Sicheng, et al.
Published: (2026)
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree
by: Zhang, Yilin, et al.
Published: (2025)
by: Zhang, Yilin, et al.
Published: (2025)
TOM-SWE: User Mental Modeling For Software Engineering Agents
by: Zhou, Xuhui, et al.
Published: (2025)
by: Zhou, Xuhui, et al.
Published: (2025)
How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations
by: Wang, Zora Zhiruo, et al.
Published: (2025)
by: Wang, Zora Zhiruo, et al.
Published: (2025)
OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)
Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning
by: Yin, Shaofeng, et al.
Published: (2026)
by: Yin, Shaofeng, et al.
Published: (2026)
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
by: Jiang, Ziyan, et al.
Published: (2024)
by: Jiang, Ziyan, et al.
Published: (2024)
How Well Does Agent Development Reflect Real-World Work?
by: Wang, Zora Zhiruo, et al.
Published: (2026)
by: Wang, Zora Zhiruo, et al.
Published: (2026)
TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks
by: Wang, Zhiruo, et al.
Published: (2024)
by: Wang, Zhiruo, et al.
Published: (2024)
CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation
by: Huq, Faria, et al.
Published: (2025)
by: Huq, Faria, et al.
Published: (2025)
Trajectory Entropy: Modeling Game State Stability from Multimodality Trajectory Prediction
by: Zhang, Yesheng, et al.
Published: (2025)
by: Zhang, Yesheng, et al.
Published: (2025)
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
by: Jiang, Ziyan, et al.
Published: (2024)
by: Jiang, Ziyan, et al.
Published: (2024)
General Phrase Debiaser: Debiasing Masked Language Models at a Multi-Token Level
by: Shi, Bingkang, et al.
Published: (2023)
by: Shi, Bingkang, et al.
Published: (2023)
What Are Tools Anyway? A Survey from the Language Model Perspective
by: Wang, Zhiruo, et al.
Published: (2024)
by: Wang, Zhiruo, et al.
Published: (2024)
WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis
by: Gao, Yifei, et al.
Published: (2025)
by: Gao, Yifei, et al.
Published: (2025)
ToolMem: Enhancing Multimodal Agents with Learnable Tool Capability Memory
by: Xiao, Yunzhong, et al.
Published: (2025)
by: Xiao, Yunzhong, et al.
Published: (2025)
STICKERCONV: Generating Multimodal Empathetic Responses from Scratch
by: Zhang, Yiqun, et al.
Published: (2024)
by: Zhang, Yiqun, et al.
Published: (2024)
Efficient Agent: Optimizing Planning Capability for Multimodal Retrieval Augmented Generation
by: Wang, Yuechen, et al.
Published: (2025)
by: Wang, Yuechen, et al.
Published: (2025)
OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
by: Yang, Wei, et al.
Published: (2025)
by: Yang, Wei, et al.
Published: (2025)
API-Assisted Code Generation for Question Answering on Varied Table Structures
by: Cao, Yihan, et al.
Published: (2023)
by: Cao, Yihan, et al.
Published: (2023)
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
by: Zheng, Boyuan, et al.
Published: (2025)
by: Zheng, Boyuan, et al.
Published: (2025)
Generalizable Multimodal Large Language Model Editing via Invariant Trajectory Learning
by: Su, Jiajie, et al.
Published: (2026)
by: Su, Jiajie, et al.
Published: (2026)
Bridging the Reproducibility Divide: Open Source Software's Role in Standardizing Healthcare AI
by: Wu, John, et al.
Published: (2026)
by: Wu, John, et al.
Published: (2026)
Intention-Aware Diffusion Model for Pedestrian Trajectory Prediction
by: Liu, Yu, et al.
Published: (2025)
by: Liu, Yu, et al.
Published: (2025)
Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf
by: Jin, Xuanfa, et al.
Published: (2024)
by: Jin, Xuanfa, et al.
Published: (2024)
HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific Domains
by: Wang, Shijie, et al.
Published: (2025)
by: Wang, Shijie, et al.
Published: (2025)
Unraveling and Mitigating Retriever Inconsistencies in Retrieval-Augmented Large Language Models
by: Li, Mingda, et al.
Published: (2024)
by: Li, Mingda, et al.
Published: (2024)
Hierarchical Retrieval-Augmented Generation Model with Rethink for Multi-hop Question Answering
by: Zhang, Xiaoming, et al.
Published: (2024)
by: Zhang, Xiaoming, et al.
Published: (2024)
Bottleneck Tokens for Unified Multimodal Retrieval
by: Sun, Siyu, et al.
Published: (2026)
by: Sun, Siyu, et al.
Published: (2026)
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation
by: Zhao, Xiangyu, et al.
Published: (2024)
by: Zhao, Xiangyu, et al.
Published: (2024)
Merge then Realign: Simple and Effective Modality-Incremental Continual Learning for Multimodal LLMs
by: Zhang, Dingkun, et al.
Published: (2025)
by: Zhang, Dingkun, et al.
Published: (2025)
RRO: LLM Agent Optimization Through Rising Reward Trajectories
by: Wang, Zilong, et al.
Published: (2025)
by: Wang, Zilong, et al.
Published: (2025)
Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering
by: Chen, Xupeng, et al.
Published: (2026)
by: Chen, Xupeng, et al.
Published: (2026)
Efficient Table Retrieval and Understanding with Multimodal Large Language Models
by: Xu, Zhuoyan, et al.
Published: (2026)
by: Xu, Zhuoyan, et al.
Published: (2026)
Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative Engine Optimization
by: Zhao, XinYu, et al.
Published: (2026)
by: Zhao, XinYu, et al.
Published: (2026)
Simple Graph Condensation
by: Xiao, Zhenbang, et al.
Published: (2024)
by: Xiao, Zhenbang, et al.
Published: (2024)
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
by: Wang, Bin, et al.
Published: (2024)
by: Wang, Bin, et al.
Published: (2024)
Similar Items
-
WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces
by: Fan, Sicheng, et al.
Published: (2026) -
ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?
by: Waghjale, Siddhant, et al.
Published: (2024) -
WebCanvas: Benchmarking Web Agents in Online Environments
by: Pan, Yichen, et al.
Published: (2024) -
WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents
by: Fan, Sicheng, et al.
Published: (2026) -
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree
by: Zhang, Yilin, et al.
Published: (2025)