Saved in:
| Main Authors: | Du, Tianxiang, He, Hulingxiao, Peng, Yuxin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.23980 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AesFormer: Transform Everyday Photos into Beautiful Memories
by: Du, Tianxiang, et al.
Published: (2026)
by: Du, Tianxiang, et al.
Published: (2026)
Taxonomy-Aware Representation Alignment for Hierarchical Visual Recognition with Large Multimodal Models
by: He, Hulingxiao, et al.
Published: (2026)
by: He, Hulingxiao, et al.
Published: (2026)
CountMamba: Exploring Multi-directional Selective State-Space Models for Plant Counting
by: He, Hulingxiao, et al.
Published: (2024)
by: He, Hulingxiao, et al.
Published: (2024)
Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning
by: He, Hulingxiao, et al.
Published: (2026)
by: He, Hulingxiao, et al.
Published: (2026)
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models
by: He, Hulingxiao, et al.
Published: (2025)
by: He, Hulingxiao, et al.
Published: (2025)
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception
by: Huang, Yipo, et al.
Published: (2024)
by: Huang, Yipo, et al.
Published: (2024)
AesCrop: Aesthetic-driven Cropping Guided by Composition
by: Wong, Yen-Hong, et al.
Published: (2025)
by: Wong, Yen-Hong, et al.
Published: (2025)
VideoAesBench: Benchmarking the Video Aesthetics Perception Capabilities of Large Multimodal Models
by: Li, Yunhao, et al.
Published: (2026)
by: Li, Yunhao, et al.
Published: (2026)
ProCrop: Learning Aesthetic Image Cropping from Professional Compositions
by: Zhang, Ke, et al.
Published: (2025)
by: Zhang, Ke, et al.
Published: (2025)
The Photographer Eye: Teaching Multimodal Large Language Models to Understand Image Aesthetics like Photographers
by: Qi, Daiqing, et al.
Published: (2025)
by: Qi, Daiqing, et al.
Published: (2025)
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model
by: Li, Mingxing, et al.
Published: (2025)
by: Li, Mingxing, et al.
Published: (2025)
VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model
by: Yang, Jinze, et al.
Published: (2024)
by: Yang, Jinze, et al.
Published: (2024)
EAGLE: Expert-Augmented Attention Guidance for Tuning-Free Industrial Anomaly Detection in Multimodal Large Language Models
by: Peng, Xiaomeng, et al.
Published: (2026)
by: Peng, Xiaomeng, et al.
Published: (2026)
Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: A Comprehensive Evaluation
by: Yu, Hong-Tao, et al.
Published: (2025)
by: Yu, Hong-Tao, et al.
Published: (2025)
OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects
by: Qiu, Wenmo, et al.
Published: (2024)
by: Qiu, Wenmo, et al.
Published: (2024)
A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis
by: Liu, Xiang, et al.
Published: (2025)
by: Liu, Xiang, et al.
Published: (2025)
Diffusion-based Aesthetic QR Code Generation via Scanning-Robust Perceptual Guidance
by: Liao, Jia-Wei, et al.
Published: (2024)
by: Liao, Jia-Wei, et al.
Published: (2024)
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
by: Fu, Chaoyou, et al.
Published: (2023)
by: Fu, Chaoyou, et al.
Published: (2023)
From Perception to Reasoning: Deep Thinking Empowers Multimodal Large Language Models
by: Zhu, Wenxin, et al.
Published: (2025)
by: Zhu, Wenxin, et al.
Published: (2025)
Empowering Segmentation Ability to Multi-modal Large Language Models
by: Yang, Yuqi, et al.
Published: (2024)
by: Yang, Yuqi, et al.
Published: (2024)
Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
by: Liu, Yuansen, et al.
Published: (2025)
by: Liu, Yuansen, et al.
Published: (2025)
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
by: Zhu, Muzhi, et al.
Published: (2025)
by: Zhu, Muzhi, et al.
Published: (2025)
Can GPT tell us why these images are synthesized? Empowering Multimodal Large Language Models for Forensics
by: He, Yiran, et al.
Published: (2025)
by: He, Yiran, et al.
Published: (2025)
Image Aesthetic Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance
by: Hu, Zhiyuan, et al.
Published: (2025)
by: Hu, Zhiyuan, et al.
Published: (2025)
Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models
by: Peng, Tianfan, et al.
Published: (2025)
by: Peng, Tianfan, et al.
Published: (2025)
Empowering Large Language Models with 3D Situation Awareness
by: Yuan, Zhihao, et al.
Published: (2025)
by: Yuan, Zhihao, et al.
Published: (2025)
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
by: Li, Jiansheng, et al.
Published: (2025)
by: Li, Jiansheng, et al.
Published: (2025)
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models
by: Ren, Xiyu, et al.
Published: (2026)
by: Ren, Xiyu, et al.
Published: (2026)
ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models
by: De Min, Thomas, et al.
Published: (2026)
by: De Min, Thomas, et al.
Published: (2026)
PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models
by: Xu, Zhaopan, et al.
Published: (2025)
by: Xu, Zhaopan, et al.
Published: (2025)
Diffusion-based Facial Aesthetics Enhancement with 3D Structure Guidance
by: Li, Lisha, et al.
Published: (2025)
by: Li, Lisha, et al.
Published: (2025)
SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models
by: Shi, Yichen, et al.
Published: (2024)
by: Shi, Yichen, et al.
Published: (2024)
VP-Bench: A Comprehensive Benchmark for Visual Prompting in Multimodal Large Language Models
by: Xu, Mingjie, et al.
Published: (2025)
by: Xu, Mingjie, et al.
Published: (2025)
Compose Your Aesthetics: Empowering Text-to-Image Models with the Principles of Art
by: Jin, Zhe, et al.
Published: (2025)
by: Jin, Zhe, et al.
Published: (2025)
PromptLNet: Region-Adaptive Aesthetic Enhancement via Prompt Guidance in Low-Light Enhancement Net
by: Yin, Jun, et al.
Published: (2025)
by: Yin, Jun, et al.
Published: (2025)
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
by: Ye, Qinghao, et al.
Published: (2023)
by: Ye, Qinghao, et al.
Published: (2023)
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
by: Luo, Gen, et al.
Published: (2024)
by: Luo, Gen, et al.
Published: (2024)
TiFRe: Text-guided Video Frame Reduction for Efficient Video Multi-modal Large Language Models
by: Zheng, Xiangtian, et al.
Published: (2026)
by: Zheng, Xiangtian, et al.
Published: (2026)
CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models
by: Luo, Fuwen, et al.
Published: (2024)
by: Luo, Fuwen, et al.
Published: (2024)
Dynamic Resolution Guidance for Facial Expression Recognition
by: Wang, Songpan, et al.
Published: (2024)
by: Wang, Songpan, et al.
Published: (2024)
Similar Items
-
AesFormer: Transform Everyday Photos into Beautiful Memories
by: Du, Tianxiang, et al.
Published: (2026) -
Taxonomy-Aware Representation Alignment for Hierarchical Visual Recognition with Large Multimodal Models
by: He, Hulingxiao, et al.
Published: (2026) -
CountMamba: Exploring Multi-directional Selective State-Space Models for Plant Counting
by: He, Hulingxiao, et al.
Published: (2024) -
Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning
by: He, Hulingxiao, et al.
Published: (2026) -
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models
by: He, Hulingxiao, et al.
Published: (2025)