:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tian, Baoliang, Si, Yuxuan, Wang, Jilong, Li, Lingyao, Bao, Zhongyuan, Zhou, Zineng, Wang, Tao, Li, Sixu, Xu, Ziyao, Wang, Mingze, Zhang, Zhouzhuo, Wang, Zhihao, Yun, Yike, Tian, Ke, Yang, Ning, Qiu, Minghui
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.21717
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs
by: Wang, Siting, et al.
Published: (2025)

CrossCheck: Input Validation for WAN Control Systems
by: Krentsel, Alexander, et al.
Published: (2026)

LinearARD: Linear-Memory Attention Distillation for RoPE Restoration
by: Yang, Ning, et al.
Published: (2026)

AI-generated Image Quality Assessment in Visual Communication
by: Tian, Yu, et al.
Published: (2024)

Diagnosing and Repairing Citation Failures in Generative Engine Optimization
by: Tian, Zhihua, et al.
Published: (2026)

Resolving Knowledge Conflicts in Large Language Models
by: Wang, Yike, et al.
Published: (2023)

CiteCheck: Towards Accurate Citation Faithfulness Detection
by: Xu, Ziyao, et al.
Published: (2025)

Towards Fast, Memory-based and Data-Efficient Vision-Language Policy
by: Li, Haoxuan, et al.
Published: (2025)

Res-Bench: Benchmarking the Robustness of Multimodal Large Language Models to Dynamic Resolution Input
by: Li, Chenxu, et al.
Published: (2025)

WhenLoss: Diagnosing Write and Retrieval Bottlenecks in Long-Context Memory Systems
by: Yu, Jiangnan, et al.
Published: (2026)

Improving the generalization of gait recognition with limited datasets
by: Zhou, Qian, et al.
Published: (2025)

InFi-Check: Interpretable and Fine-Grained Fact-Checking of LLMs
by: Bai, Yuzhuo, et al.
Published: (2026)

MTMD: Multi-Scale Temporal Memory Learning and Efficient Debiasing Framework for Stock Trend Forecasting
by: Wang, Mingjie, et al.
Published: (2022)

MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
by: Wang, Shengkang, et al.
Published: (2024)

Hallucinations are inevitable but can be made statistically negligible
by: Suzuki, Atsushi, et al.
Published: (2025)

Continuous Perception Matters: Diagnosing Temporal Integration Failures in Multimodal Models
by: Wang, Zeyu, et al.
Published: (2024)

DeepResearch Bench II: Diagnosing Deep Research Agents via Rubrics from Expert Report
by: Li, Ruizhe, et al.
Published: (2026)

DO-Bench: An Attributable Benchmark for Diagnosing Object Hallucination in Vision-Language Models
by: Wang, JiYang, et al.
Published: (2026)

FetchBot: Learning Generalizable Object Fetching in Cluttered Scenes via Zero-Shot Sim2Real
by: Liu, Weiheng, et al.
Published: (2025)

Investigating the (De)Composition Capabilities of Large Language Models in Natural-to-Formal Language Conversion
by: Xu, Ziyao, et al.
Published: (2025)

SPOR: A Comprehensive and Practical Evaluation Method for Compositional Generalization in Data-to-Text Generation
by: Xu, Ziyao, et al.
Published: (2024)

AgenticIQA: An Agentic Framework for Adaptive and Interpretable Image Quality Assessment
by: Zhu, Hanwei, et al.
Published: (2025)

AsymLoRA: Harmonizing Data Conflicts and Commonalities in MLLMs
by: Wei, Xuyang, et al.
Published: (2025)

M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes
by: Zhang, Zeyu, et al.
Published: (2024)

RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking
by: Yang, Shuo, et al.
Published: (2025)

BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning
by: Liu, Yuyang, et al.
Published: (2025)

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
by: Fang, Rongyao, et al.
Published: (2025)

Robust Misinformation Detection by Visiting Potential Commonsense Conflict
by: Wang, Bing, et al.
Published: (2025)

MolViBench: Evaluating LLMs on Molecular Vibe Coding
by: Li, Jiatong, et al.
Published: (2026)

Rehearsal: Simulating Conflict to Teach Conflict Resolution
by: Shaikh, Omar, et al.
Published: (2023)

A Sanity Check for Multi-In-Domain Face Forgery Detection in the Real World
by: Cheng, Jikang, et al.
Published: (2025)

Learning Actionable Manipulation Recovery via Counterfactual Failure Synthesis
by: Li, Dayou, et al.
Published: (2026)

Relational Mediators: LLM Chatbots as Boundary Objects in Psychotherapy
by: Quan, Jiatao, et al.
Published: (2025)

SPD-Faith Bench: Diagnosing and Improving Faithfulness in Chain-of-Thought for Multimodal Large Language Models
by: Lv, Weijiang, et al.
Published: (2026)

CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset
by: Wang, Zhiming, et al.
Published: (2024)

Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling
by: Li, Zhihao, et al.
Published: (2025)

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
by: Li, Jianling, et al.
Published: (2025)

ConflictBench: Evaluating Human-AI Conflict via Interactive and Visually Grounded Environments
by: Zhao, Weixiang, et al.
Published: (2026)

PALM-Bench: A Comprehensive Benchmark for Personalized Audio-Language Models
by: Wang, Yuwen, et al.
Published: (2026)

Library Drift: Diagnosing and Fixing a Silent Failure Mode in Self-Evolving LLM Skill Libraries
by: Zhang, Xing, et al.
Published: (2026)