:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ding, Xuanwen, Pan, Chengjun, Li, Zejun, Zhang, Jiwen, Wang, Siyuan, Wei, Zhongyu
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2505.21389
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HyLaT: Efficient Multi-Agent Communication via Hybrid Latent-Text Protocol
by: Mou, Xinyi, et al.
Published: (2026)

AutoJudge: Judge Decoding Without Manual Annotation
by: Garipov, Roman, et al.
Published: (2025)

From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking
by: Wang, Siyuan, et al.
Published: (2024)

Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation
by: Wang, Siyuan, et al.
Published: (2024)

VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
by: Li, Zejun, et al.
Published: (2024)

From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents
by: Mou, Xinyi, et al.
Published: (2024)

OViP: Online Vision-Language Preference Learning for VLM Hallucination
by: Liu, Shujun, et al.
Published: (2025)

Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks
by: Yue, Shengbin, et al.
Published: (2024)

DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
by: Du, Mengfei, et al.
Published: (2024)

SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation
by: Zhang, Jiwen, et al.
Published: (2026)

EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models
by: Du, Mengfei, et al.
Published: (2024)

Stepwise Informativeness Search for Efficient and Effective LLM Reasoning
by: Wang, Siyuan, et al.
Published: (2025)

MAGNET: Towards Adaptive GUI Agents with Memory-Driven Knowledge Evolution
by: Sun, Libo, et al.
Published: (2026)

MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging
by: Bao, Zhijie, et al.
Published: (2026)

CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards
by: Zhang, Taolin, et al.
Published: (2025)

Affordance Benchmark for MLLMs
by: Wang, Junying, et al.
Published: (2025)

Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs
by: Wang, Siyuan, et al.
Published: (2024)

Symbolic Working Memory Enhances Language Models for Complex Rule Application
by: Wang, Siyuan, et al.
Published: (2024)

CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
by: Cao, Maosong, et al.
Published: (2024)

Android in the Zoo: Chain-of-Action-Thought for GUI Agents
by: Zhang, Jiwen, et al.
Published: (2024)

CommunityBench: Benchmarking Community-Level Alignment across Diverse Groups and Tasks
by: Lin, Jiayu, et al.
Published: (2026)

AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator
by: Fan, Zhihao, et al.
Published: (2024)

EcoLANG: Efficient and Effective Agent Communication Language Induction for Social Simulation
by: Mou, Xinyi, et al.
Published: (2025)

Redundancy Principles for MLLMs Benchmarks
by: Zhang, Zicheng, et al.
Published: (2025)

StreamProfileBench: A Benchmark for Fine-Grained User Profile Inference in Real-World Streaming Scenarios
by: Wang, Sizhe, et al.
Published: (2026)

HAF-RM: A Hybrid Alignment Framework for Reward Model Training
by: Liu, Shujun, et al.
Published: (2024)

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection
by: Miao, Ziqi, et al.
Published: (2025)

AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios
by: Mou, Xinyi, et al.
Published: (2024)

Visual Room 2.0: Seeing is Not Understanding for MLLMs
by: Li, Haokun, et al.
Published: (2025)

Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning
by: Li, Zejun, et al.
Published: (2025)

Multi-Agent Simulator Drives Language Models for Legal Intensive Interaction
by: Yue, Shengbin, et al.
Published: (2025)

Interleaved Latent Visual Reasoning with Selective Perceptual Modeling
by: Dong, Shuai, et al.
Published: (2025)

Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation
by: Mou, Xinyi, et al.
Published: (2024)

AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale
by: Wang, Ziyang, et al.
Published: (2025)

Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant
by: Shen, Lei, et al.
Published: (2025)

SpeechMedAssist: Efficiently and Effectively Adapting Speech Language Models for Medical Consultation
by: Chen, Sirry, et al.
Published: (2026)

ALaRM: Align Language Models via Hierarchical Rewards Modeling
by: Lai, Yuhang, et al.
Published: (2024)

InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language Models
by: Ding, Jing, et al.
Published: (2025)

Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
by: Wang, Siyuan, et al.
Published: (2024)

MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning
by: Li, Xiaoyuan, et al.
Published: (2025)