Saved in:
| Main Authors: | Ding, Xuanwen, Pan, Chengjun, Li, Zejun, Zhang, Jiwen, Wang, Siyuan, Wei, Zhongyu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.21389 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HyLaT: Efficient Multi-Agent Communication via Hybrid Latent-Text Protocol
by: Mou, Xinyi, et al.
Published: (2026)
by: Mou, Xinyi, et al.
Published: (2026)
AutoJudge: Judge Decoding Without Manual Annotation
by: Garipov, Roman, et al.
Published: (2025)
by: Garipov, Roman, et al.
Published: (2025)
From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking
by: Wang, Siyuan, et al.
Published: (2024)
by: Wang, Siyuan, et al.
Published: (2024)
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation
by: Wang, Siyuan, et al.
Published: (2024)
by: Wang, Siyuan, et al.
Published: (2024)
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
by: Li, Zejun, et al.
Published: (2024)
by: Li, Zejun, et al.
Published: (2024)
From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents
by: Mou, Xinyi, et al.
Published: (2024)
by: Mou, Xinyi, et al.
Published: (2024)
OViP: Online Vision-Language Preference Learning for VLM Hallucination
by: Liu, Shujun, et al.
Published: (2025)
by: Liu, Shujun, et al.
Published: (2025)
Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks
by: Yue, Shengbin, et al.
Published: (2024)
by: Yue, Shengbin, et al.
Published: (2024)
DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning
by: Du, Mengfei, et al.
Published: (2024)
by: Du, Mengfei, et al.
Published: (2024)
SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation
by: Zhang, Jiwen, et al.
Published: (2026)
by: Zhang, Jiwen, et al.
Published: (2026)
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models
by: Du, Mengfei, et al.
Published: (2024)
by: Du, Mengfei, et al.
Published: (2024)
Stepwise Informativeness Search for Efficient and Effective LLM Reasoning
by: Wang, Siyuan, et al.
Published: (2025)
by: Wang, Siyuan, et al.
Published: (2025)
MAGNET: Towards Adaptive GUI Agents with Memory-Driven Knowledge Evolution
by: Sun, Libo, et al.
Published: (2026)
by: Sun, Libo, et al.
Published: (2026)
MedRCube: A Multidimensional Framework for Fine-Grained and In-Depth Evaluation of MLLMs in Medical Imaging
by: Bao, Zhijie, et al.
Published: (2026)
by: Bao, Zhijie, et al.
Published: (2026)
CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards
by: Zhang, Taolin, et al.
Published: (2025)
by: Zhang, Taolin, et al.
Published: (2025)
Affordance Benchmark for MLLMs
by: Wang, Junying, et al.
Published: (2025)
by: Wang, Junying, et al.
Published: (2025)
Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs
by: Wang, Siyuan, et al.
Published: (2024)
by: Wang, Siyuan, et al.
Published: (2024)
Symbolic Working Memory Enhances Language Models for Complex Rule Application
by: Wang, Siyuan, et al.
Published: (2024)
by: Wang, Siyuan, et al.
Published: (2024)
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
by: Cao, Maosong, et al.
Published: (2024)
by: Cao, Maosong, et al.
Published: (2024)
Android in the Zoo: Chain-of-Action-Thought for GUI Agents
by: Zhang, Jiwen, et al.
Published: (2024)
by: Zhang, Jiwen, et al.
Published: (2024)
CommunityBench: Benchmarking Community-Level Alignment across Diverse Groups and Tasks
by: Lin, Jiayu, et al.
Published: (2026)
by: Lin, Jiayu, et al.
Published: (2026)
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator
by: Fan, Zhihao, et al.
Published: (2024)
by: Fan, Zhihao, et al.
Published: (2024)
EcoLANG: Efficient and Effective Agent Communication Language Induction for Social Simulation
by: Mou, Xinyi, et al.
Published: (2025)
by: Mou, Xinyi, et al.
Published: (2025)
Redundancy Principles for MLLMs Benchmarks
by: Zhang, Zicheng, et al.
Published: (2025)
by: Zhang, Zicheng, et al.
Published: (2025)
StreamProfileBench: A Benchmark for Fine-Grained User Profile Inference in Real-World Streaming Scenarios
by: Wang, Sizhe, et al.
Published: (2026)
by: Wang, Sizhe, et al.
Published: (2026)
HAF-RM: A Hybrid Alignment Framework for Reward Model Training
by: Liu, Shujun, et al.
Published: (2024)
by: Liu, Shujun, et al.
Published: (2024)
Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection
by: Miao, Ziqi, et al.
Published: (2025)
by: Miao, Ziqi, et al.
Published: (2025)
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios
by: Mou, Xinyi, et al.
Published: (2024)
by: Mou, Xinyi, et al.
Published: (2024)
Visual Room 2.0: Seeing is Not Understanding for MLLMs
by: Li, Haokun, et al.
Published: (2025)
by: Li, Haokun, et al.
Published: (2025)
Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning
by: Li, Zejun, et al.
Published: (2025)
by: Li, Zejun, et al.
Published: (2025)
Multi-Agent Simulator Drives Language Models for Legal Intensive Interaction
by: Yue, Shengbin, et al.
Published: (2025)
by: Yue, Shengbin, et al.
Published: (2025)
Interleaved Latent Visual Reasoning with Selective Perceptual Modeling
by: Dong, Shuai, et al.
Published: (2025)
by: Dong, Shuai, et al.
Published: (2025)
Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation
by: Mou, Xinyi, et al.
Published: (2024)
by: Mou, Xinyi, et al.
Published: (2024)
AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale
by: Wang, Ziyang, et al.
Published: (2025)
by: Wang, Ziyang, et al.
Published: (2025)
Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant
by: Shen, Lei, et al.
Published: (2025)
by: Shen, Lei, et al.
Published: (2025)
SpeechMedAssist: Efficiently and Effectively Adapting Speech Language Models for Medical Consultation
by: Chen, Sirry, et al.
Published: (2026)
by: Chen, Sirry, et al.
Published: (2026)
ALaRM: Align Language Models via Hierarchical Rewards Modeling
by: Lai, Yuhang, et al.
Published: (2024)
by: Lai, Yuhang, et al.
Published: (2024)
InsQABench: Benchmarking Chinese Insurance Domain Question Answering with Large Language Models
by: Ding, Jing, et al.
Published: (2025)
by: Ding, Jing, et al.
Published: (2025)
Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
by: Wang, Siyuan, et al.
Published: (2024)
by: Wang, Siyuan, et al.
Published: (2024)
MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning
by: Li, Xiaoyuan, et al.
Published: (2025)
by: Li, Xiaoyuan, et al.
Published: (2025)
Similar Items
-
HyLaT: Efficient Multi-Agent Communication via Hybrid Latent-Text Protocol
by: Mou, Xinyi, et al.
Published: (2026) -
AutoJudge: Judge Decoding Without Manual Annotation
by: Garipov, Roman, et al.
Published: (2025) -
From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking
by: Wang, Siyuan, et al.
Published: (2024) -
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation
by: Wang, Siyuan, et al.
Published: (2024) -
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
by: Li, Zejun, et al.
Published: (2024)