:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Xuan, Jiang, Ziyan, Meng, Rui, Leng, Yifei, Xiao, Zhenbang, Wang, Zora Zhiruo, Shang, Yanyi, Kong, Dehan
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2506.22056
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces
by: Fan, Sicheng, et al.
Published: (2026)

ECCO: Can We Improve Model-Generated Code Efficiency Without Sacrificing Functional Correctness?
by: Waghjale, Siddhant, et al.
Published: (2024)

WebCanvas: Benchmarking Web Agents in Online Environments
by: Pan, Yichen, et al.
Published: (2024)

WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents
by: Fan, Sicheng, et al.
Published: (2026)

cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree
by: Zhang, Yilin, et al.
Published: (2025)

TOM-SWE: User Mental Modeling For Software Engineering Agents
by: Zhou, Xuhui, et al.
Published: (2025)

How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations
by: Wang, Zora Zhiruo, et al.
Published: (2025)

OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)

Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning
by: Yin, Shaofeng, et al.
Published: (2026)

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
by: Jiang, Ziyan, et al.
Published: (2024)

How Well Does Agent Development Reflect Real-World Work?
by: Wang, Zora Zhiruo, et al.
Published: (2026)

TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks
by: Wang, Zhiruo, et al.
Published: (2024)

CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation
by: Huq, Faria, et al.
Published: (2025)

Trajectory Entropy: Modeling Game State Stability from Multimodality Trajectory Prediction
by: Zhang, Yesheng, et al.
Published: (2025)

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
by: Jiang, Ziyan, et al.
Published: (2024)

General Phrase Debiaser: Debiasing Masked Language Models at a Multi-Token Level
by: Shi, Bingkang, et al.
Published: (2023)

What Are Tools Anyway? A Survey from the Language Model Perspective
by: Wang, Zhiruo, et al.
Published: (2024)

WebSynthesis: World-Model-Guided MCTS for Efficient WebUI-Trajectory Synthesis
by: Gao, Yifei, et al.
Published: (2025)

ToolMem: Enhancing Multimodal Agents with Learnable Tool Capability Memory
by: Xiao, Yunzhong, et al.
Published: (2025)

STICKERCONV: Generating Multimodal Empathetic Responses from Scratch
by: Zhang, Yiqun, et al.
Published: (2024)

Efficient Agent: Optimizing Planning Capability for Multimodal Retrieval Augmented Generation
by: Wang, Yuechen, et al.
Published: (2025)

OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
by: Yang, Wei, et al.
Published: (2025)

API-Assisted Code Generation for Question Answering on Varied Table Structures
by: Cao, Yihan, et al.
Published: (2023)

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
by: Zheng, Boyuan, et al.
Published: (2025)

Generalizable Multimodal Large Language Model Editing via Invariant Trajectory Learning
by: Su, Jiajie, et al.
Published: (2026)

Bridging the Reproducibility Divide: Open Source Software's Role in Standardizing Healthcare AI
by: Wu, John, et al.
Published: (2026)

Intention-Aware Diffusion Model for Pedestrian Trajectory Prediction
by: Liu, Yu, et al.
Published: (2025)

Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf
by: Jin, Xuanfa, et al.
Published: (2024)

HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific Domains
by: Wang, Shijie, et al.
Published: (2025)

Unraveling and Mitigating Retriever Inconsistencies in Retrieval-Augmented Large Language Models
by: Li, Mingda, et al.
Published: (2024)

Hierarchical Retrieval-Augmented Generation Model with Rethink for Multi-hop Question Answering
by: Zhang, Xiaoming, et al.
Published: (2024)

Bottleneck Tokens for Unified Multimodal Retrieval
by: Sun, Siyu, et al.
Published: (2026)

UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation
by: Zhao, Xiangyu, et al.
Published: (2024)

Merge then Realign: Simple and Effective Modality-Incremental Continual Learning for Multimodal LLMs
by: Zhang, Dingkun, et al.
Published: (2025)

RRO: LLM Agent Optimization Through Rising Reward Trajectories
by: Wang, Zilong, et al.
Published: (2025)

Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering
by: Chen, Xupeng, et al.
Published: (2026)

Efficient Table Retrieval and Understanding with Multimodal Large Language Models
by: Xu, Zhuoyan, et al.
Published: (2026)

Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative Engine Optimization
by: Zhao, XinYu, et al.
Published: (2026)

Simple Graph Condensation
by: Xiao, Zhenbang, et al.
Published: (2024)

IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
by: Wang, Bin, et al.
Published: (2024)