Saved in:
| Main Authors: | Li, Haokun, Zhang, Yazhou, Ding, Jizhi, Li, Qiuchi, Zhang, Peng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.11101 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Visual Room 2.0: Seeing is Not Understanding for MLLMs
by: Li, Haokun, et al.
Published: (2025)
by: Li, Haokun, et al.
Published: (2025)
Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?
by: Yao, Ben, et al.
Published: (2024)
by: Yao, Ben, et al.
Published: (2024)
Beyond Single-Sentence Prompts: Upgrading Value Alignment Benchmarks with Dialogues and Stories
by: Zhang, Yazhou, et al.
Published: (2025)
by: Zhang, Yazhou, et al.
Published: (2025)
DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations
by: Zhang, Yazhou, et al.
Published: (2023)
by: Zhang, Yazhou, et al.
Published: (2023)
Large Language Models for Subjective Language Understanding: A Survey
by: Song, Changhao, et al.
Published: (2025)
by: Song, Changhao, et al.
Published: (2025)
NurValues: Real-World Nursing Values Evaluation for Large Language Models in Clinical Context
by: Yao, Ben, et al.
Published: (2025)
by: Yao, Ben, et al.
Published: (2025)
Pushing The Limit of LLM Capacity for Text Classification
by: Zhang, Yazhou, et al.
Published: (2024)
by: Zhang, Yazhou, et al.
Published: (2024)
Robust Prompt Optimization for Large Language Models Against Distribution Shifts
by: Li, Moxin, et al.
Published: (2023)
by: Li, Moxin, et al.
Published: (2023)
SarcasmBench: Towards Evaluating Large Language Models on Sarcasm Understanding
by: Zhang, Yazhou, et al.
Published: (2024)
by: Zhang, Yazhou, et al.
Published: (2024)
Joint Extraction and Classification of Danish Competences for Job Matching
by: Li, Qiuchi, et al.
Published: (2024)
by: Li, Qiuchi, et al.
Published: (2024)
Large Emotional World Model
by: Song, Changhao, et al.
Published: (2025)
by: Song, Changhao, et al.
Published: (2025)
TextReasoningBench: Does Reasoning Really Improve Text Classification in Large Language Models?
by: Guo, Xinyu, et al.
Published: (2026)
by: Guo, Xinyu, et al.
Published: (2026)
Large Language Models are Learnable Planners for Long-Term Recommendation
by: Shi, Wentao, et al.
Published: (2024)
by: Shi, Wentao, et al.
Published: (2024)
Towards the Law of Capacity Gap in Distilling Language Models
by: Zhang, Chen, et al.
Published: (2023)
by: Zhang, Chen, et al.
Published: (2023)
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
by: Cai, Shihao, et al.
Published: (2024)
by: Cai, Shihao, et al.
Published: (2024)
Emotion-o1: Adaptive Long Reasoning for Emotion Understanding in LLMs
by: Song, Changhao, et al.
Published: (2025)
by: Song, Changhao, et al.
Published: (2025)
Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge
by: Sui, Yi, et al.
Published: (2025)
by: Sui, Yi, et al.
Published: (2025)
Beyond the Leaderboard: Understanding Performance Disparities in Large Language Models via Model Diffing
by: Boughorbel, Sabri, et al.
Published: (2025)
by: Boughorbel, Sabri, et al.
Published: (2025)
Prospect Personalized Recommendation on Large Language Model-based Agent Platform
by: Zhang, Jizhi, et al.
Published: (2024)
by: Zhang, Jizhi, et al.
Published: (2024)
Memorization $\neq$ Understanding: Do Large Language Models Have the Ability of Scenario Cognition?
by: Ma, Boxiang, et al.
Published: (2025)
by: Ma, Boxiang, et al.
Published: (2025)
Disparities In Negation Understanding Across Languages In Vision-Language Models
by: Moraitaki, Charikleia, et al.
Published: (2026)
by: Moraitaki, Charikleia, et al.
Published: (2026)
WundtGPT: Shaping Large Language Models To Be An Empathetic, Proactive Psychologist
by: Ren, Chenyu, et al.
Published: (2024)
by: Ren, Chenyu, et al.
Published: (2024)
"I See What You Did There": Can Large Vision-Language Models Understand Multimodal Puns?
by: Xu, Naen, et al.
Published: (2026)
by: Xu, Naen, et al.
Published: (2026)
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
by: Zhang, Peiyi, et al.
Published: (2024)
by: Zhang, Peiyi, et al.
Published: (2024)
Chain of Stance: Stance Detection with Large Language Models
by: Ma, Junxia, et al.
Published: (2024)
by: Ma, Junxia, et al.
Published: (2024)
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
by: Liang, Yijun, et al.
Published: (2025)
by: Liang, Yijun, et al.
Published: (2025)
Commander-GPT: Fully Unleashing the Sarcasm Detection Capability of Multi-Modal Large Language Models
by: Zhang, Yazhou, et al.
Published: (2025)
by: Zhang, Yazhou, et al.
Published: (2025)
Quantifying Language Disparities in Multilingual Large Language Models
by: Hu, Songbo, et al.
Published: (2025)
by: Hu, Songbo, et al.
Published: (2025)
A Survey on Large Language Model Benchmarks
by: Ni, Shiwen, et al.
Published: (2025)
by: Ni, Shiwen, et al.
Published: (2025)
GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding
by: Zhang, Ziyin, et al.
Published: (2024)
by: Zhang, Ziyin, et al.
Published: (2024)
Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models
by: Chen, Junjie, et al.
Published: (2025)
by: Chen, Junjie, et al.
Published: (2025)
TongGu: Mastering Classical Chinese Understanding with Knowledge-Grounded Large Language Models
by: Cao, Jiahuan, et al.
Published: (2024)
by: Cao, Jiahuan, et al.
Published: (2024)
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation
by: Xue, Haochen, et al.
Published: (2025)
by: Xue, Haochen, et al.
Published: (2025)
Cognition-of-Thought Elicits Social-Aligned Reasoning in Large Language Models
by: Zhang, Xuanming, et al.
Published: (2025)
by: Zhang, Xuanming, et al.
Published: (2025)
When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models
by: Li, Yinghui, et al.
Published: (2024)
by: Li, Yinghui, et al.
Published: (2024)
C$^{3}$Bench: A Comprehensive Classical Chinese Understanding Benchmark for Large Language Models
by: Cao, Jiahuan, et al.
Published: (2024)
by: Cao, Jiahuan, et al.
Published: (2024)
CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model
by: Lei, Yang, et al.
Published: (2023)
by: Lei, Yang, et al.
Published: (2023)
MulCogBench: A Multi-modal Cognitive Benchmark Dataset for Evaluating Chinese and English Computational Language Models
by: Zhang, Yunhao, et al.
Published: (2024)
by: Zhang, Yunhao, et al.
Published: (2024)
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset
by: Zhu, Jie, et al.
Published: (2024)
by: Zhu, Jie, et al.
Published: (2024)
MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language Models
by: Ding, Peng, et al.
Published: (2024)
by: Ding, Peng, et al.
Published: (2024)
Similar Items
-
Visual Room 2.0: Seeing is Not Understanding for MLLMs
by: Li, Haokun, et al.
Published: (2025) -
Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?
by: Yao, Ben, et al.
Published: (2024) -
Beyond Single-Sentence Prompts: Upgrading Value Alignment Benchmarks with Dialogues and Stories
by: Zhang, Yazhou, et al.
Published: (2025) -
DialogueLLM: Context and Emotion Knowledge-Tuned Large Language Models for Emotion Recognition in Conversations
by: Zhang, Yazhou, et al.
Published: (2023) -
Large Language Models for Subjective Language Understanding: A Survey
by: Song, Changhao, et al.
Published: (2025)