:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Haokun, Zhang, Yazhou, Ding, Jizhi, Li, Qiuchi, Zhang, Peng
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2511.12928
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Seeing is Not Understanding: A Benchmark on Perception-Cognition Disparities in Large Language Models
by: Li, Haokun, et al.
Published: (2025)

Beyond Single-Sentence Prompts: Upgrading Value Alignment Benchmarks with Dialogues and Stories
by: Zhang, Yazhou, et al.
Published: (2025)

Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?
by: Yao, Ben, et al.
Published: (2024)

Are MLMs Trapped in the Visual Room?
by: Zhang, Yazhou, et al.
Published: (2025)

Pushing The Limit of LLM Capacity for Text Classification
by: Zhang, Yazhou, et al.
Published: (2024)

DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations
by: Zhang, Yazhou, et al.
Published: (2023)

Seeing is Believing: Rich-Context Hallucination Detection for MLLMs via Backward Visual Grounding
by: Guo, Pinxue, et al.
Published: (2025)

NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context
by: Yao, Ben, et al.
Published: (2025)

Do MLLMs Really Understand the Charts?
by: Zhang, Xiao, et al.
Published: (2025)

Joint Extraction and Classification of Danish Competences for Job Matching
by: Li, Qiuchi, et al.
Published: (2024)

Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs
by: Yeh, Chun-Hsiao, et al.
Published: (2025)

Large Language Models for Subjective Language Understanding: A Survey
by: Song, Changhao, et al.
Published: (2025)

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
by: Zhang, Renrui, et al.
Published: (2024)

Emotion-o1: Adaptive Long Reasoning for Emotion Understanding in LLMs
by: Song, Changhao, et al.
Published: (2025)

Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge
by: Sui, Yi, et al.
Published: (2025)

Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs
by: Zhang, Huanyu, et al.
Published: (2025)

Distill Visual Chart Reasoning Ability from LLMs to MLLMs
by: He, Wei, et al.
Published: (2024)

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection
by: Miao, Ziqi, et al.
Published: (2025)

When Seeing Is not Enough: Revealing the Limits of Active Reasoning in MLLMs
by: Liu, Hongcheng, et al.
Published: (2025)

AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs
by: Ding, Xuanwen, et al.
Published: (2025)

MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning
by: Li, Xiaoyuan, et al.
Published: (2025)

T5Gemma 2: Seeing, Reading, and Understanding Longer
by: Zhang, Biao, et al.
Published: (2025)

Roles of MLLMs in Visually Rich Document Retrieval for RAG: A Survey
by: Zhang, Xiantao
Published: (2025)

Can MLLMs Understand the Deep Implication Behind Chinese Images?
by: Zhang, Chenhao, et al.
Published: (2024)

Can MLLMs Read the Room? A Multimodal Benchmark for Assessing Deception in Multi-Party Social Interactions
by: Kang, Caixin, et al.
Published: (2025)

Affordance Benchmark for MLLMs
by: Wang, Junying, et al.
Published: (2025)

From Text to Pixel: Advancing Long-Context Understanding in MLLMs
by: Lu, Yujie, et al.
Published: (2024)

GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning
by: Chen, Guizhen, et al.
Published: (2025)

Large Emotional World Model
by: Song, Changhao, et al.
Published: (2025)

Turning Internal Gap into Self-Improvement: Promoting the Generation-Understanding Unification in MLLMs
by: Han, Yujin, et al.
Published: (2025)

Towards the Law of Capacity Gap in Distilling Language Models
by: Zhang, Chen, et al.
Published: (2023)

Can MLLMs Read the Room? A Multimodal Benchmark for Verifying Truthfulness in Multi-Party Social Interactions
by: Kang, Caixin, et al.
Published: (2025)

Seeing the Poem: Image-Semantic Detection of AI-Generated Modern Chinese Poetry with MLLMs
by: Wang, Shanshan, et al.
Published: (2026)

Robust Prompt Optimization for Large Language Models Against Distribution Shifts
by: Li, Moxin, et al.
Published: (2023)

SarcasmBench: Towards Evaluating Large Language Models on Sarcasm Understanding
by: Zhang, Yazhou, et al.
Published: (2024)

VideoScaffold: Elastic-Scale Visual Hierarchies for Streaming Video Understanding in MLLMs
by: Zheng, Naishan, et al.
Published: (2025)

MLLMs-Augmented Visual-Language Representation Learning
by: Liu, Yanqing, et al.
Published: (2023)

AdaCodec: A Predictive Visual Code for Video MLLMs
by: Hou, Haowen, et al.
Published: (2026)

SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs
by: Wang, Siting, et al.
Published: (2025)

See the Text: From Tokenization to Visual Reading
by: Xing, Ling, et al.
Published: (2025)