Saved in:
| Main Authors: | Tian, Baoliang, Si, Yuxuan, Wang, Jilong, Li, Lingyao, Bao, Zhongyuan, Zhou, Zineng, Wang, Tao, Li, Sixu, Xu, Ziyao, Wang, Mingze, Zhang, Zhouzhuo, Wang, Zhihao, Yun, Yike, Tian, Ke, Yang, Ning, Qiu, Minghui |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.21717 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs
by: Wang, Siting, et al.
Published: (2025)
by: Wang, Siting, et al.
Published: (2025)
CrossCheck: Input Validation for WAN Control Systems
by: Krentsel, Alexander, et al.
Published: (2026)
by: Krentsel, Alexander, et al.
Published: (2026)
LinearARD: Linear-Memory Attention Distillation for RoPE Restoration
by: Yang, Ning, et al.
Published: (2026)
by: Yang, Ning, et al.
Published: (2026)
AI-generated Image Quality Assessment in Visual Communication
by: Tian, Yu, et al.
Published: (2024)
by: Tian, Yu, et al.
Published: (2024)
Diagnosing and Repairing Citation Failures in Generative Engine Optimization
by: Tian, Zhihua, et al.
Published: (2026)
by: Tian, Zhihua, et al.
Published: (2026)
Resolving Knowledge Conflicts in Large Language Models
by: Wang, Yike, et al.
Published: (2023)
by: Wang, Yike, et al.
Published: (2023)
CiteCheck: Towards Accurate Citation Faithfulness Detection
by: Xu, Ziyao, et al.
Published: (2025)
by: Xu, Ziyao, et al.
Published: (2025)
Towards Fast, Memory-based and Data-Efficient Vision-Language Policy
by: Li, Haoxuan, et al.
Published: (2025)
by: Li, Haoxuan, et al.
Published: (2025)
Res-Bench: Benchmarking the Robustness of Multimodal Large Language Models to Dynamic Resolution Input
by: Li, Chenxu, et al.
Published: (2025)
by: Li, Chenxu, et al.
Published: (2025)
WhenLoss: Diagnosing Write and Retrieval Bottlenecks in Long-Context Memory Systems
by: Yu, Jiangnan, et al.
Published: (2026)
by: Yu, Jiangnan, et al.
Published: (2026)
Improving the generalization of gait recognition with limited datasets
by: Zhou, Qian, et al.
Published: (2025)
by: Zhou, Qian, et al.
Published: (2025)
InFi-Check: Interpretable and Fine-Grained Fact-Checking of LLMs
by: Bai, Yuzhuo, et al.
Published: (2026)
by: Bai, Yuzhuo, et al.
Published: (2026)
MTMD: Multi-Scale Temporal Memory Learning and Efficient Debiasing Framework for Stock Trend Forecasting
by: Wang, Mingjie, et al.
Published: (2022)
by: Wang, Mingjie, et al.
Published: (2022)
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
by: Wang, Shengkang, et al.
Published: (2024)
by: Wang, Shengkang, et al.
Published: (2024)
Hallucinations are inevitable but can be made statistically negligible
by: Suzuki, Atsushi, et al.
Published: (2025)
by: Suzuki, Atsushi, et al.
Published: (2025)
Continuous Perception Matters: Diagnosing Temporal Integration Failures in Multimodal Models
by: Wang, Zeyu, et al.
Published: (2024)
by: Wang, Zeyu, et al.
Published: (2024)
DeepResearch Bench II: Diagnosing Deep Research Agents via Rubrics from Expert Report
by: Li, Ruizhe, et al.
Published: (2026)
by: Li, Ruizhe, et al.
Published: (2026)
DO-Bench: An Attributable Benchmark for Diagnosing Object Hallucination in Vision-Language Models
by: Wang, JiYang, et al.
Published: (2026)
by: Wang, JiYang, et al.
Published: (2026)
FetchBot: Learning Generalizable Object Fetching in Cluttered Scenes via Zero-Shot Sim2Real
by: Liu, Weiheng, et al.
Published: (2025)
by: Liu, Weiheng, et al.
Published: (2025)
Investigating the (De)Composition Capabilities of Large Language Models in Natural-to-Formal Language Conversion
by: Xu, Ziyao, et al.
Published: (2025)
by: Xu, Ziyao, et al.
Published: (2025)
SPOR: A Comprehensive and Practical Evaluation Method for Compositional Generalization in Data-to-Text Generation
by: Xu, Ziyao, et al.
Published: (2024)
by: Xu, Ziyao, et al.
Published: (2024)
AgenticIQA: An Agentic Framework for Adaptive and Interpretable Image Quality Assessment
by: Zhu, Hanwei, et al.
Published: (2025)
by: Zhu, Hanwei, et al.
Published: (2025)
AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs
by: Wei, Xuyang, et al.
Published: (2025)
by: Wei, Xuyang, et al.
Published: (2025)
M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes
by: Zhang, Zeyu, et al.
Published: (2024)
by: Zhang, Zeyu, et al.
Published: (2024)
RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking
by: Yang, Shuo, et al.
Published: (2025)
by: Yang, Shuo, et al.
Published: (2025)
BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning
by: Liu, Yuyang, et al.
Published: (2025)
by: Liu, Yuyang, et al.
Published: (2025)
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
by: Fang, Rongyao, et al.
Published: (2025)
by: Fang, Rongyao, et al.
Published: (2025)
Robust Misinformation Detection by Visiting Potential Commonsense Conflict
by: Wang, Bing, et al.
Published: (2025)
by: Wang, Bing, et al.
Published: (2025)
MolViBench: Evaluating LLMs on Molecular Vibe Coding
by: Li, Jiatong, et al.
Published: (2026)
by: Li, Jiatong, et al.
Published: (2026)
Rehearsal: Simulating Conflict to Teach Conflict Resolution
by: Shaikh, Omar, et al.
Published: (2023)
by: Shaikh, Omar, et al.
Published: (2023)
A Sanity Check for Multi-In-Domain Face Forgery Detection in the Real World
by: Cheng, Jikang, et al.
Published: (2025)
by: Cheng, Jikang, et al.
Published: (2025)
Learning Actionable Manipulation Recovery via Counterfactual Failure Synthesis
by: Li, Dayou, et al.
Published: (2026)
by: Li, Dayou, et al.
Published: (2026)
Relational Mediators: LLM Chatbots as Boundary Objects in Psychotherapy
by: Quan, Jiatao, et al.
Published: (2025)
by: Quan, Jiatao, et al.
Published: (2025)
SPD-Faith Bench: Diagnosing and Improving Faithfulness in Chain-of-Thought for Multimodal Large Language Models
by: Lv, Weijiang, et al.
Published: (2026)
by: Lv, Weijiang, et al.
Published: (2026)
CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset
by: Wang, Zhiming, et al.
Published: (2024)
by: Wang, Zhiming, et al.
Published: (2024)
Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling
by: Li, Zhihao, et al.
Published: (2025)
by: Li, Zhihao, et al.
Published: (2025)
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
by: Li, Jianling, et al.
Published: (2025)
by: Li, Jianling, et al.
Published: (2025)
ConflictBench: Evaluating Human-AI Conflict via Interactive and Visually Grounded Environments
by: Zhao, Weixiang, et al.
Published: (2026)
by: Zhao, Weixiang, et al.
Published: (2026)
PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models
by: Wang, Yuwen, et al.
Published: (2026)
by: Wang, Yuwen, et al.
Published: (2026)
Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries
by: Zhang, Xing, et al.
Published: (2026)
by: Zhang, Xing, et al.
Published: (2026)
Similar Items
-
SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs
by: Wang, Siting, et al.
Published: (2025) -
CrossCheck: Input Validation for WAN Control Systems
by: Krentsel, Alexander, et al.
Published: (2026) -
LinearARD: Linear-Memory Attention Distillation for RoPE Restoration
by: Yang, Ning, et al.
Published: (2026) -
AI-generated Image Quality Assessment in Visual Communication
by: Tian, Yu, et al.
Published: (2024) -
Diagnosing and Repairing Citation Failures in Generative Engine Optimization
by: Tian, Zhihua, et al.
Published: (2026)