Saved in:
| Main Authors: | Li, Haokun, Zhang, Yazhou, Ding, Jizhi, Li, Qiuchi, Zhang, Peng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.12928 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Seeing is Not Understanding: A Benchmark on Perception-Cognition Disparities in Large Language Models
by: Li, Haokun, et al.
Published: (2025)
by: Li, Haokun, et al.
Published: (2025)
Beyond Single-Sentence Prompts: Upgrading Value Alignment Benchmarks with Dialogues and Stories
by: Zhang, Yazhou, et al.
Published: (2025)
by: Zhang, Yazhou, et al.
Published: (2025)
Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?
by: Yao, Ben, et al.
Published: (2024)
by: Yao, Ben, et al.
Published: (2024)
Are MLMs Trapped in the Visual Room?
by: Zhang, Yazhou, et al.
Published: (2025)
by: Zhang, Yazhou, et al.
Published: (2025)
Pushing The Limit of LLM Capacity for Text Classification
by: Zhang, Yazhou, et al.
Published: (2024)
by: Zhang, Yazhou, et al.
Published: (2024)
DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations
by: Zhang, Yazhou, et al.
Published: (2023)
by: Zhang, Yazhou, et al.
Published: (2023)
Seeing is Believing: Rich-Context Hallucination Detection for MLLMs via Backward Visual Grounding
by: Guo, Pinxue, et al.
Published: (2025)
by: Guo, Pinxue, et al.
Published: (2025)
NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context
by: Yao, Ben, et al.
Published: (2025)
by: Yao, Ben, et al.
Published: (2025)
Do MLLMs Really Understand the Charts?
by: Zhang, Xiao, et al.
Published: (2025)
by: Zhang, Xiao, et al.
Published: (2025)
Joint Extraction and Classification of Danish Competences for Job Matching
by: Li, Qiuchi, et al.
Published: (2024)
by: Li, Qiuchi, et al.
Published: (2024)
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs
by: Yeh, Chun-Hsiao, et al.
Published: (2025)
by: Yeh, Chun-Hsiao, et al.
Published: (2025)
Large Language Models for Subjective Language Understanding: A Survey
by: Song, Changhao, et al.
Published: (2025)
by: Song, Changhao, et al.
Published: (2025)
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
by: Zhang, Renrui, et al.
Published: (2024)
by: Zhang, Renrui, et al.
Published: (2024)
Emotion-o1: Adaptive Long Reasoning for Emotion Understanding in LLMs
by: Song, Changhao, et al.
Published: (2025)
by: Song, Changhao, et al.
Published: (2025)
Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge
by: Sui, Yi, et al.
Published: (2025)
by: Sui, Yi, et al.
Published: (2025)
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs
by: Zhang, Huanyu, et al.
Published: (2025)
by: Zhang, Huanyu, et al.
Published: (2025)
Distill Visual Chart Reasoning Ability from LLMs to MLLMs
by: He, Wei, et al.
Published: (2024)
by: He, Wei, et al.
Published: (2024)
Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection
by: Miao, Ziqi, et al.
Published: (2025)
by: Miao, Ziqi, et al.
Published: (2025)
When Seeing Is not Enough: Revealing the Limits of Active Reasoning in MLLMs
by: Liu, Hongcheng, et al.
Published: (2025)
by: Liu, Hongcheng, et al.
Published: (2025)
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs
by: Ding, Xuanwen, et al.
Published: (2025)
by: Ding, Xuanwen, et al.
Published: (2025)
MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning
by: Li, Xiaoyuan, et al.
Published: (2025)
by: Li, Xiaoyuan, et al.
Published: (2025)
T5Gemma 2: Seeing, Reading, and Understanding Longer
by: Zhang, Biao, et al.
Published: (2025)
by: Zhang, Biao, et al.
Published: (2025)
Roles of MLLMs in Visually Rich Document Retrieval for RAG: A Survey
by: Zhang, Xiantao
Published: (2025)
by: Zhang, Xiantao
Published: (2025)
Can MLLMs Understand the Deep Implication Behind Chinese Images?
by: Zhang, Chenhao, et al.
Published: (2024)
by: Zhang, Chenhao, et al.
Published: (2024)
Can MLLMs Read the Room? A Multimodal Benchmark for Assessing Deception in Multi-Party Social Interactions
by: Kang, Caixin, et al.
Published: (2025)
by: Kang, Caixin, et al.
Published: (2025)
Affordance Benchmark for MLLMs
by: Wang, Junying, et al.
Published: (2025)
by: Wang, Junying, et al.
Published: (2025)
From Text to Pixel: Advancing Long-Context Understanding in MLLMs
by: Lu, Yujie, et al.
Published: (2024)
by: Lu, Yujie, et al.
Published: (2024)
GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning
by: Chen, Guizhen, et al.
Published: (2025)
by: Chen, Guizhen, et al.
Published: (2025)
Large Emotional World Model
by: Song, Changhao, et al.
Published: (2025)
by: Song, Changhao, et al.
Published: (2025)
Turning Internal Gap into Self-Improvement: Promoting the Generation-Understanding Unification in MLLMs
by: Han, Yujin, et al.
Published: (2025)
by: Han, Yujin, et al.
Published: (2025)
Towards the Law of Capacity Gap in Distilling Language Models
by: Zhang, Chen, et al.
Published: (2023)
by: Zhang, Chen, et al.
Published: (2023)
Can MLLMs Read the Room? A Multimodal Benchmark for Verifying Truthfulness in Multi-Party Social Interactions
by: Kang, Caixin, et al.
Published: (2025)
by: Kang, Caixin, et al.
Published: (2025)
Seeing the Poem: Image-Semantic Detection of AI-Generated Modern Chinese Poetry with MLLMs
by: Wang, Shanshan, et al.
Published: (2026)
by: Wang, Shanshan, et al.
Published: (2026)
Robust Prompt Optimization for Large Language Models Against Distribution Shifts
by: Li, Moxin, et al.
Published: (2023)
by: Li, Moxin, et al.
Published: (2023)
SarcasmBench: Towards Evaluating Large Language Models on Sarcasm Understanding
by: Zhang, Yazhou, et al.
Published: (2024)
by: Zhang, Yazhou, et al.
Published: (2024)
VideoScaffold: Elastic-Scale Visual Hierarchies for Streaming Video Understanding in MLLMs
by: Zheng, Naishan, et al.
Published: (2025)
by: Zheng, Naishan, et al.
Published: (2025)
MLLMs-Augmented Visual-Language Representation Learning
by: Liu, Yanqing, et al.
Published: (2023)
by: Liu, Yanqing, et al.
Published: (2023)
AdaCodec: A Predictive Visual Code for Video MLLMs
by: Hou, Haowen, et al.
Published: (2026)
by: Hou, Haowen, et al.
Published: (2026)
SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs
by: Wang, Siting, et al.
Published: (2025)
by: Wang, Siting, et al.
Published: (2025)
See the Text: From Tokenization to Visual Reading
by: Xing, Ling, et al.
Published: (2025)
by: Xing, Ling, et al.
Published: (2025)
Similar Items
-
Seeing is Not Understanding: A Benchmark on Perception-Cognition Disparities in Large Language Models
by: Li, Haokun, et al.
Published: (2025) -
Beyond Single-Sentence Prompts: Upgrading Value Alignment Benchmarks with Dialogues and Stories
by: Zhang, Yazhou, et al.
Published: (2025) -
Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?
by: Yao, Ben, et al.
Published: (2024) -
Are MLMs Trapped in the Visual Room?
by: Zhang, Yazhou, et al.
Published: (2025) -
Pushing The Limit of LLM Capacity for Text Classification
by: Zhang, Yazhou, et al.
Published: (2024)