Saved in:
| Main Authors: | Li, Chunyi, Li, Xiaozhe, Zhang, Zicheng, Tian, Yuan, Jia, Ziheng, Liu, Xiaohong, Min, Xiongkuo, Wang, Jia, Duan, Haodong, Chen, Kai, Zhai, Guangtao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.10079 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Redundancy Principles for MLLMs Benchmarks
by: Zhang, Zicheng, et al.
Published: (2025)
by: Zhang, Zicheng, et al.
Published: (2025)
Image Quality Assessment: From Human to Machine Preference
by: Li, Chunyi, et al.
Published: (2025)
by: Li, Chunyi, et al.
Published: (2025)
PuzzleBench: A Fully Dynamic Evaluation Framework for Large Multimodal Models on Puzzle Solving
by: Zhang, Zeyu, et al.
Published: (2025)
by: Zhang, Zeyu, et al.
Published: (2025)
GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs
by: Zhu, Xiaorong, et al.
Published: (2025)
by: Zhu, Xiaorong, et al.
Published: (2025)
MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis
by: Zhou, Yingjie, et al.
Published: (2024)
by: Zhou, Yingjie, et al.
Published: (2024)
Affordance Benchmark for MLLMs
by: Wang, Junying, et al.
Published: (2025)
by: Wang, Junying, et al.
Published: (2025)
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs
by: Zhang, Zicheng, et al.
Published: (2024)
by: Zhang, Zicheng, et al.
Published: (2024)
HazeCLIP: Towards Language Guided Real-World Image Dehazing
by: Wang, Ruiyi, et al.
Published: (2024)
by: Wang, Ruiyi, et al.
Published: (2024)
VQA$^2$: Visual Question Answering for Video Quality Assessment
by: Jia, Ziheng, et al.
Published: (2024)
by: Jia, Ziheng, et al.
Published: (2024)
Improve MLLM Benchmark Efficiency through Interview
by: Wen, Farong, et al.
Published: (2025)
by: Wen, Farong, et al.
Published: (2025)
Towards Explainable Partial-AIGC Image Quality Assessment
by: Qian, Jiaying, et al.
Published: (2025)
by: Qian, Jiaying, et al.
Published: (2025)
GeoR-Bench: Evaluating Geoscience Visual Reasoning
by: Zheng, Yushuo, et al.
Published: (2026)
by: Zheng, Yushuo, et al.
Published: (2026)
Multi-Dimensional Quality Assessment for Text-to-3D Assets: Dataset and Model
by: Fu, Kang, et al.
Published: (2025)
by: Fu, Kang, et al.
Published: (2025)
A Multi-To-One Interview Paradigm for Efficient MLLM Evaluation
by: Shen, Ye, et al.
Published: (2025)
by: Shen, Ye, et al.
Published: (2025)
SafetyFlow: An Agent-Flow System for Automated LLM Safety Benchmarking
by: Zhu, Xiangyang, et al.
Published: (2025)
by: Zhu, Xiangyang, et al.
Published: (2025)
IllusionBench+: A Large-scale and Comprehensive Benchmark for Visual Illusion Understanding in Vision-Language Models
by: Zhang, Yiming, et al.
Published: (2025)
by: Zhang, Yiming, et al.
Published: (2025)
Subjective-Aligned Dataset and Metric for Text-to-Video Quality Assessment
by: Kou, Tengchuan, et al.
Published: (2024)
by: Kou, Tengchuan, et al.
Published: (2024)
LM Fight Arena: Benchmarking Large Multimodal Models via Game Competition
by: Zheng, Yushuo, et al.
Published: (2025)
by: Zheng, Yushuo, et al.
Published: (2025)
Q-Bench-Portrait: Benchmarking Multimodal Large Language Models on Portrait Image Quality Perception
by: Wu, Sijing, et al.
Published: (2026)
by: Wu, Sijing, et al.
Published: (2026)
VideoAesBench: Benchmarking the Video Aesthetics Perception Capabilities of Large Multimodal Models
by: Li, Yunhao, et al.
Published: (2026)
by: Li, Yunhao, et al.
Published: (2026)
TIT-Score: Evaluating Long-Prompt Based Text-to-Image Alignment via Text-to-Image-to-Text Consistency
by: Wang, Juntong, et al.
Published: (2025)
by: Wang, Juntong, et al.
Published: (2025)
Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition
by: Zheng, Yushuo, et al.
Published: (2026)
by: Zheng, Yushuo, et al.
Published: (2026)
Automated Safety Benchmarking: A Multi-agent Pipeline for LVLMs
by: Zhu, Xiangyang, et al.
Published: (2026)
by: Zhu, Xiangyang, et al.
Published: (2026)
GeoX-Bench: Benchmarking Cross-View Geo-Localization and Pose Estimation Capabilities of Large Multimodal Models
by: Zheng, Yushuo, et al.
Published: (2025)
by: Zheng, Yushuo, et al.
Published: (2025)
R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?
by: Li, Chunyi, et al.
Published: (2024)
by: Li, Chunyi, et al.
Published: (2024)
DyCoRM: Dynamic Criterion-Aware Reward Modeling for Text-to-Image Generation
by: Qian, Jiaying, et al.
Published: (2026)
by: Qian, Jiaying, et al.
Published: (2026)
3DGCQA: A Quality Assessment Database for 3D AI-Generated Contents
by: Zhou, Yingjie, et al.
Published: (2024)
by: Zhou, Yingjie, et al.
Published: (2024)
LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM
by: Zhang, Zicheng, et al.
Published: (2024)
by: Zhang, Zicheng, et al.
Published: (2024)
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
by: Zhang, Zicheng, et al.
Published: (2024)
by: Zhang, Zicheng, et al.
Published: (2024)
Using GUI Agent for Electronic Design Automation
by: Li, Chunyi, et al.
Published: (2025)
by: Li, Chunyi, et al.
Published: (2025)
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation
by: Wang, Jiarui, et al.
Published: (2025)
by: Wang, Jiarui, et al.
Published: (2025)
Q-Refine: A Perceptual Quality Refiner for AI-Generated Image
by: Li, Chunyi, et al.
Published: (2024)
by: Li, Chunyi, et al.
Published: (2024)
Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model
by: Zhang, Zhichao, et al.
Published: (2024)
by: Zhang, Zhichao, et al.
Published: (2024)
NeedleBench: Evaluating LLM Retrieval and Reasoning Across Varying Information Densities
by: Li, Mo, et al.
Published: (2024)
by: Li, Mo, et al.
Published: (2024)
Quality Assessment in the Era of Large Models: A Survey
by: Zhang, Zicheng, et al.
Published: (2024)
by: Zhang, Zicheng, et al.
Published: (2024)
CMC-Bench: Towards a New Paradigm of Visual Signal Compression
by: Li, Chunyi, et al.
Published: (2024)
by: Li, Chunyi, et al.
Published: (2024)
EEmo-Logic: A Unified Dataset and Multi-Stage Framework for Comprehensive Image-Evoked Emotion Assessment
by: Gao, Lancheng, et al.
Published: (2026)
by: Gao, Lancheng, et al.
Published: (2026)
Human-Centric Evaluation for Foundation Models
by: Guo, Yijin, et al.
Published: (2025)
by: Guo, Yijin, et al.
Published: (2025)
AU-IQA: A Benchmark Dataset for Perceptual Quality Assessment of AI-Enhanced User-Generated Content
by: Wang, Shushi, et al.
Published: (2025)
by: Wang, Shushi, et al.
Published: (2025)
Sycophancy under Pressure: Evaluating and Mitigating Sycophantic Bias via Adversarial Dialogues in Scientific QA
by: Zhang, Kaiwei, et al.
Published: (2025)
by: Zhang, Kaiwei, et al.
Published: (2025)
Similar Items
-
Redundancy Principles for MLLMs Benchmarks
by: Zhang, Zicheng, et al.
Published: (2025) -
Image Quality Assessment: From Human to Machine Preference
by: Li, Chunyi, et al.
Published: (2025) -
PuzzleBench: A Fully Dynamic Evaluation Framework for Large Multimodal Models on Puzzle Solving
by: Zhang, Zeyu, et al.
Published: (2025) -
GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs
by: Zhu, Xiaorong, et al.
Published: (2025) -
MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis
by: Zhou, Yingjie, et al.
Published: (2024)