:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ye, Mingrui, Zheng, Chanjin, Yu, Zengyi, Xiang, Chenyu, Zhao, Zhixue, Yuan, Zheng, Yannakoudakis, Helen
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2512.12503
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ArtMentor: AI-Assisted Evaluation of Artworks to Explore Multimodal Large Language Models Capabilities
by: Zheng, Chanjin, et al.
Published: (2025)

Children's Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs
by: Ye, Hengwei, et al.
Published: (2026)

Incorporating Attribution Importance for Improving Faithfulness Metrics
by: Zhao, Zhixue, et al.
Published: (2023)

ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models
by: Zhao, Zhixue, et al.
Published: (2024)

Agentic Problem Frames: A Systematic Approach to Engineering Reliable Domain Agents
by: Park, Chanjin
Published: (2026)

ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
by: Zhu, William Yicheng, et al.
Published: (2024)

AttributionBench: How Hard is Automatic Attribution Evaluation?
by: Li, Yifei, et al.
Published: (2024)

ArtBrain: An Explainable end-to-end Toolkit for Classification and Attribution of AI-Generated Art and Style
by: Silva, Ravidu Suien Rammuni, et al.
Published: (2024)

CALM: A Causal Analysis Language Model for Tabular Data in Complex Systems with Local Scores, Conditional Independence Tests, and Relation Attributes
by: Fan, Zhenjiang, et al.
Published: (2025)

TeamLLM: A Human-Like Team-Oriented Collaboration Framework for Multi-Step Contextualized Tasks
by: Wang, Xiangyu, et al.
Published: (2026)

Strat-LLM: Stratified Strategy Alignment for LLM-based Stock Trading with Real-time Multi-Source Signals
by: Huang, Wenliang, et al.
Published: (2026)

A Functional Perspective on Knowledge Distillation in Neural Networks
by: Mason-Williams, Israel, et al.
Published: (2025)

MirrorBench: Evaluating Self-centric Intelligence in MLLMs by Introducing a Mirror
by: Guo, Shengyu, et al.
Published: (2026)

MileBench: Benchmarking MLLMs in Long Context
by: Song, Dingjie, et al.
Published: (2024)

A Function-Centric Perspective on Flat and Sharp Minima
by: Mason-Williams, Israel, et al.
Published: (2025)

PeopleSearchBench: A Multi-Dimensional Benchmark for Evaluating AI-Powered People Search Platforms
by: Wang, Wei, et al.
Published: (2026)

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
by: Jiang, Fengqing, et al.
Published: (2024)

CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension
by: Zhang, Zhi, et al.
Published: (2023)

The Art of Tool Interface Design
by: Wu, Yunnan, et al.
Published: (2025)

A Forced-Choice Neural Cognitive Diagnostic Model of Personality Testing
by: Li, Xiaoyu, et al.
Published: (2025)

The Pleasure Principle: Where is it in Kids' Art Books
by: Wilton, Shirley M.
Published: (1977)

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
by: Lin, Junming, et al.
Published: (2024)

The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason
by: Liang, Shanchao, et al.
Published: (2025)

What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks
by: Kirch, Nathalie, et al.
Published: (2024)

Differentiating Student Feedbacks for Knowledge Tracing
by: Cui, Jiajun, et al.
Published: (2022)

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
by: Li, Caorui, et al.
Published: (2025)

Wired Perspectives: Multi-View Wire Art Embraces Generative AI
by: Qu, Zhiyu, et al.
Published: (2023)

Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans
by: Qiu, Yansheng, et al.
Published: (2025)

Position: State-of-the-Art Claims Require State-of-the-Art Evidence
by: Oh, YongKyung
Published: (2026)

CSR-Bench: A Benchmark for Evaluating the Cross-modal Safety and Reliability of MLLMs
by: Liu, Yuxuan, et al.
Published: (2026)

TIDE-Bench: Task-Aware and Diagnostic Evaluation of Tool-Integrated Reasoning
by: Li, Yize, et al.
Published: (2026)

Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment
by: Hong, Jiaying, et al.
Published: (2025)

TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs
by: Xu, Pengju, et al.
Published: (2025)

A Scoping Review of Energy-Efficient Driving Behaviors and Applied State-of-the-Art AI Methods
by: Ma, Zhipeng, et al.
Published: (2024)

MetaCD: A Meta Learning Framework for Cognitive Diagnosis based on Continual Learning
by: Wu, Jin, et al.
Published: (2025)

VQArt-Bench: A semantically rich VQA Benchmark for Art and Cultural Heritage
by: Alfarano, A., et al.
Published: (2025)

Holistic Evaluation of State-of-the-Art LLMs for Code Generation
by: Zhang, Le, et al.
Published: (2025)

Comparing Explanation Faithfulness between Multilingual and Monolingual Fine-tuned Language Models
by: Zhao, Zhixue, et al.
Published: (2024)

Donors and Recipients: On Asymmetric Transfer Across Tasks and Languages with Parameter-Efficient Fine-Tuning
by: Dymkiewicz, Kajetan, et al.
Published: (2025)

Quantifying Compositionality of Classic and State-of-the-Art Embeddings
by: Guo, Zhijin, et al.
Published: (2025)