Saved in:
| Main Authors: | Ye, Mingrui, Zheng, Chanjin, Yu, Zengyi, Xiang, Chenyu, Zhao, Zhixue, Yuan, Zheng, Yannakoudakis, Helen |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.12503 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities
by: Zheng, Chanjin, et al.
Published: (2025)
by: Zheng, Chanjin, et al.
Published: (2025)
Children's Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs
by: Ye, Hengwei, et al.
Published: (2026)
by: Ye, Hengwei, et al.
Published: (2026)
Incorporating Attribution Importance for Improving Faithfulness Metrics
by: Zhao, Zhixue, et al.
Published: (2023)
by: Zhao, Zhixue, et al.
Published: (2023)
ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models
by: Zhao, Zhixue, et al.
Published: (2024)
by: Zhao, Zhixue, et al.
Published: (2024)
Agentic Problem Frames: A Systematic Approach to Engineering Reliable Domain Agents
by: Park, Chanjin
Published: (2026)
by: Park, Chanjin
Published: (2026)
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
by: Zhu, William Yicheng, et al.
Published: (2024)
by: Zhu, William Yicheng, et al.
Published: (2024)
AttributionBench: How Hard is Automatic Attribution Evaluation?
by: Li, Yifei, et al.
Published: (2024)
by: Li, Yifei, et al.
Published: (2024)
ArtBrain: An Explainable end-to-end Toolkit for Classification and Attribution of AI-Generated Art and Style
by: Silva, Ravidu Suien Rammuni, et al.
Published: (2024)
by: Silva, Ravidu Suien Rammuni, et al.
Published: (2024)
CALM: A Causal Analysis Language Model for Tabular Data in Complex Systems with Local Scores, Conditional Independence Tests, and Relation Attributes
by: Fan, Zhenjiang, et al.
Published: (2025)
by: Fan, Zhenjiang, et al.
Published: (2025)
TeamLLM: A Human-Like Team-Oriented Collaboration Framework for Multi-Step Contextualized Tasks
by: Wang, Xiangyu, et al.
Published: (2026)
by: Wang, Xiangyu, et al.
Published: (2026)
Strat-LLM: Stratified Strategy Alignment for LLM-based Stock Trading with Real-time Multi-Source Signals
by: Huang, Wenliang, et al.
Published: (2026)
by: Huang, Wenliang, et al.
Published: (2026)
A Functional Perspective on Knowledge Distillation in Neural Networks
by: Mason-Williams, Israel, et al.
Published: (2025)
by: Mason-Williams, Israel, et al.
Published: (2025)
MirrorBench: Evaluating Self-centric Intelligence in MLLMs by Introducing a Mirror
by: Guo, Shengyu, et al.
Published: (2026)
by: Guo, Shengyu, et al.
Published: (2026)
MileBench: Benchmarking MLLMs in Long Context
by: Song, Dingjie, et al.
Published: (2024)
by: Song, Dingjie, et al.
Published: (2024)
A Function-Centric Perspective on Flat and Sharp Minima
by: Mason-Williams, Israel, et al.
Published: (2025)
by: Mason-Williams, Israel, et al.
Published: (2025)
PeopleSearchBench: A Multi-Dimensional Benchmark for Evaluating AI-Powered People Search Platforms
by: Wang, Wei, et al.
Published: (2026)
by: Wang, Wei, et al.
Published: (2026)
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
by: Jiang, Fengqing, et al.
Published: (2024)
by: Jiang, Fengqing, et al.
Published: (2024)
CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension
by: Zhang, Zhi, et al.
Published: (2023)
by: Zhang, Zhi, et al.
Published: (2023)
The Art of Tool Interface Design
by: Wu, Yunnan, et al.
Published: (2025)
by: Wu, Yunnan, et al.
Published: (2025)
A Forced-Choice Neural Cognitive Diagnostic Model of Personality Testing
by: Li, Xiaoyu, et al.
Published: (2025)
by: Li, Xiaoyu, et al.
Published: (2025)
The Pleasure Principle: Where is it in Kids' Art Books
by: Wilton, Shirley M.
Published: (1977)
by: Wilton, Shirley M.
Published: (1977)
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
by: Lin, Junming, et al.
Published: (2024)
by: Lin, Junming, et al.
Published: (2024)
The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason
by: Liang, Shanchao, et al.
Published: (2025)
by: Liang, Shanchao, et al.
Published: (2025)
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks
by: Kirch, Nathalie, et al.
Published: (2024)
by: Kirch, Nathalie, et al.
Published: (2024)
Differentiating Student Feedbacks for Knowledge Tracing
by: Cui, Jiajun, et al.
Published: (2022)
by: Cui, Jiajun, et al.
Published: (2022)
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
by: Li, Caorui, et al.
Published: (2025)
by: Li, Caorui, et al.
Published: (2025)
Wired Perspectives: Multi-View Wire Art Embraces Generative AI
by: Qu, Zhiyu, et al.
Published: (2023)
by: Qu, Zhiyu, et al.
Published: (2023)
Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans
by: Qiu, Yansheng, et al.
Published: (2025)
by: Qiu, Yansheng, et al.
Published: (2025)
Position: State-of-the-Art Claims Require State-of-the-Art Evidence
by: Oh, YongKyung
Published: (2026)
by: Oh, YongKyung
Published: (2026)
CSR-Bench: A Benchmark for Evaluating the Cross-modal Safety and Reliability of MLLMs
by: Liu, Yuxuan, et al.
Published: (2026)
by: Liu, Yuxuan, et al.
Published: (2026)
TIDE-Bench: Task-Aware and Diagnostic Evaluation of Tool-Integrated Reasoning
by: Li, Yize, et al.
Published: (2026)
by: Li, Yize, et al.
Published: (2026)
Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment
by: Hong, Jiaying, et al.
Published: (2025)
by: Hong, Jiaying, et al.
Published: (2025)
TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs
by: Xu, Pengju, et al.
Published: (2025)
by: Xu, Pengju, et al.
Published: (2025)
A Scoping Review of Energy-Efficient Driving Behaviors and Applied State-of-the-Art AI Methods
by: Ma, Zhipeng, et al.
Published: (2024)
by: Ma, Zhipeng, et al.
Published: (2024)
MetaCD: A Meta Learning Framework for Cognitive Diagnosis based on Continual Learning
by: Wu, Jin, et al.
Published: (2025)
by: Wu, Jin, et al.
Published: (2025)
VQArt-Bench: A semantically rich VQA Benchmark for Art and Cultural Heritage
by: Alfarano, A., et al.
Published: (2025)
by: Alfarano, A., et al.
Published: (2025)
Holistic Evaluation of State-of-the-Art LLMs for Code Generation
by: Zhang, Le, et al.
Published: (2025)
by: Zhang, Le, et al.
Published: (2025)
Comparing Explanation Faithfulness between Multilingual and Monolingual Fine-tuned Language Models
by: Zhao, Zhixue, et al.
Published: (2024)
by: Zhao, Zhixue, et al.
Published: (2024)
Donors and Recipients: On Asymmetric Transfer Across Tasks and Languages with Parameter-Efficient Fine-Tuning
by: Dymkiewicz, Kajetan, et al.
Published: (2025)
by: Dymkiewicz, Kajetan, et al.
Published: (2025)
Quantifying Compositionality of Classic and State-of-the-Art Embeddings
by: Guo, Zhijin, et al.
Published: (2025)
by: Guo, Zhijin, et al.
Published: (2025)
Similar Items
-
ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities
by: Zheng, Chanjin, et al.
Published: (2025) -
Children's Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs
by: Ye, Hengwei, et al.
Published: (2026) -
Incorporating Attribution Importance for Improving Faithfulness Metrics
by: Zhao, Zhixue, et al.
Published: (2023) -
ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models
by: Zhao, Zhixue, et al.
Published: (2024) -
Agentic Problem Frames: A Systematic Approach to Engineering Reliable Domain Agents
by: Park, Chanjin
Published: (2026)