Saved in:
| Main Authors: | Li, Yuan, Nishida, Shin'ya |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.02441 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment
by: Li, Yuan, et al.
Published: (2025)
by: Li, Yuan, et al.
Published: (2025)
Building Reasonable Inference for Vision-Language Models in Blind Image Quality Assessment
by: Li, Yuan, et al.
Published: (2025)
by: Li, Yuan, et al.
Published: (2025)
Explanation Bottleneck Models
by: Yamaguchi, Shin'ya, et al.
Published: (2024)
by: Yamaguchi, Shin'ya, et al.
Published: (2024)
Rationale-Enhanced Decoding for Multi-modal Chain-of-Thought
by: Yamaguchi, Shin'ya, et al.
Published: (2025)
by: Yamaguchi, Shin'ya, et al.
Published: (2025)
Investigate the Low-level Visual Perception in Vision-Language based Image Quality Assessment
by: Li, Yuan, et al.
Published: (2025)
by: Li, Yuan, et al.
Published: (2025)
Zero-shot Concept Bottleneck Models
by: Yamaguchi, Shin'ya, et al.
Published: (2025)
by: Yamaguchi, Shin'ya, et al.
Published: (2025)
Machine Learning Modeling for Multi-order Human Visual Motion Processing
by: Sun, Zitang, et al.
Published: (2025)
by: Sun, Zitang, et al.
Published: (2025)
Learning Robust Convolutional Neural Networks with Relevant Feature Focusing via Explanations
by: Adachi, Kazuki, et al.
Published: (2022)
by: Adachi, Kazuki, et al.
Published: (2022)
HAPI: A Model for Learning Robot Facial Expressions from Human Preferences
by: Yang, Dongsheng, et al.
Published: (2025)
by: Yang, Dongsheng, et al.
Published: (2025)
Parallel In-context Learning for Large Vision Language Models
by: Yamaguchi, Shin'ya, et al.
Published: (2026)
by: Yamaguchi, Shin'ya, et al.
Published: (2026)
Adaptive Random Feature Regularization on Fine-tuning Deep Neural Networks
by: Yamaguchi, Shin'ya, et al.
Published: (2024)
by: Yamaguchi, Shin'ya, et al.
Published: (2024)
DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild
by: Fu, Honghao, et al.
Published: (2024)
by: Fu, Honghao, et al.
Published: (2024)
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
by: Yamaguchi, Shin'ya, et al.
Published: (2025)
by: Yamaguchi, Shin'ya, et al.
Published: (2025)
Viewport-Unaware Blind Omnidirectional Image Quality Assessment: A Unified and Generalized Approach
by: Yan, Jiebin, et al.
Published: (2026)
by: Yan, Jiebin, et al.
Published: (2026)
Life-IQA: Boosting Blind Image Quality Assessment through GCN-enhanced Layer Interaction and MoE-based Feature Decoupling
by: Tang, Long, et al.
Published: (2025)
by: Tang, Long, et al.
Published: (2025)
Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment
by: Fu, Jun, et al.
Published: (2024)
by: Fu, Jun, et al.
Published: (2024)
START: Spatial and Textual Learning for Chart Understanding
by: Liu, Zhuoming, et al.
Published: (2025)
by: Liu, Zhuoming, et al.
Published: (2025)
SA-IQA: Redefining Image Quality Assessment for Spatial Aesthetics with Multi-Dimensional Rewards
by: Gao, Yuan, et al.
Published: (2025)
by: Gao, Yuan, et al.
Published: (2025)
Difference Vector Equalization for Robust Fine-tuning of Vision-Language Models
by: Suzuki, Satoshi, et al.
Published: (2025)
by: Suzuki, Satoshi, et al.
Published: (2025)
TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment
by: Yuan, Jiquan, et al.
Published: (2024)
by: Yuan, Jiquan, et al.
Published: (2024)
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation
by: Tong, Chengzhuo, et al.
Published: (2026)
by: Tong, Chengzhuo, et al.
Published: (2026)
DEFNet: Multitasks-based Deep Evidential Fusion Network for Blind Image Quality Assessment
by: Lou, Yiwei, et al.
Published: (2025)
by: Lou, Yiwei, et al.
Published: (2025)
AI-generated Image Quality Assessment in Visual Communication
by: Tian, Yu, et al.
Published: (2024)
by: Tian, Yu, et al.
Published: (2024)
MultiModal Fine-tuning with Synthetic Captions
by: Enomoto, Shohei, et al.
Published: (2026)
by: Enomoto, Shohei, et al.
Published: (2026)
Leveraging Textual Compositional Reasoning for Robust Change Captioning
by: Park, Kyu Ri, et al.
Published: (2025)
by: Park, Kyu Ri, et al.
Published: (2025)
Enhancing Spatial Reasoning through Visual and Textual Thinking
by: Liang, Xun, et al.
Published: (2025)
by: Liang, Xun, et al.
Published: (2025)
Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks
by: Li, Chenjun
Published: (2026)
by: Li, Chenjun
Published: (2026)
Responses Fall Short of Understanding: Revealing the Gap between Internal Representations and Responses in Visual Document Understanding
by: Kawasaki, Haruka, et al.
Published: (2026)
by: Kawasaki, Haruka, et al.
Published: (2026)
Uniformity First: Uniformity-aware Test-time Adaptation of Vision-language Models against Image Corruption
by: Adachi, Kazuki, et al.
Published: (2025)
by: Adachi, Kazuki, et al.
Published: (2025)
AskChart: Universal Chart Understanding through Textual Enhancement
by: Yang, Xudong, et al.
Published: (2024)
by: Yang, Xudong, et al.
Published: (2024)
Q-Ponder: A Unified Training Pipeline for Reasoning-based Visual Quality Assessment
by: Cai, Zhuoxuan, et al.
Published: (2025)
by: Cai, Zhuoxuan, et al.
Published: (2025)
Eye Sclera for Fair Face Image Quality Assessment
by: Kabbani, Wassim, et al.
Published: (2025)
by: Kabbani, Wassim, et al.
Published: (2025)
LMM-IQA: Image Quality Assessment for Low-Dose CT Imaging
by: Celik, Kagan, et al.
Published: (2025)
by: Celik, Kagan, et al.
Published: (2025)
Decoupling Perception and Calibration: Label-Efficient Image Quality Assessment Framework
by: Li, Xinyue, et al.
Published: (2026)
by: Li, Xinyue, et al.
Published: (2026)
IQAGPT: Image Quality Assessment with Vision-language and ChatGPT Models
by: Chen, Zhihao, et al.
Published: (2023)
by: Chen, Zhihao, et al.
Published: (2023)
CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning
by: Wu, Hang, et al.
Published: (2026)
by: Wu, Hang, et al.
Published: (2026)
PTTA: A Pure Text-to-Animation Framework for High-Quality Creation
by: Chen, Ruiqi, et al.
Published: (2025)
by: Chen, Ruiqi, et al.
Published: (2025)
NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment
by: Han, Shuhao, et al.
Published: (2025)
by: Han, Shuhao, et al.
Published: (2025)
Exploring the AI Obedience: Why is Generating a Pure Color Image Harder than CyberPunk?
by: Li, Hongyu, et al.
Published: (2026)
by: Li, Hongyu, et al.
Published: (2026)
Revisiting Visual Understanding in Multimodal Reasoning through a Lens of Image Perturbation
by: Li, Yuting, et al.
Published: (2025)
by: Li, Yuting, et al.
Published: (2025)
Similar Items
-
Guiding Perception-Reasoning Closer to Human in Blind Image Quality Assessment
by: Li, Yuan, et al.
Published: (2025) -
Building Reasonable Inference for Vision-Language Models in Blind Image Quality Assessment
by: Li, Yuan, et al.
Published: (2025) -
Explanation Bottleneck Models
by: Yamaguchi, Shin'ya, et al.
Published: (2024) -
Rationale-Enhanced Decoding for Multi-modal Chain-of-Thought
by: Yamaguchi, Shin'ya, et al.
Published: (2025) -
Investigate the Low-level Visual Perception in Vision-Language based Image Quality Assessment
by: Li, Yuan, et al.
Published: (2025)