Saved in:
| Main Authors: | Cai, Zhaolin, Li, Fan, Duan, Huiyu, He, Lijun, Zhai, Guangtao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.24021 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Generative Human-Object Interaction Detection via Differentiable Cognitive Steering of Multi-modal LLMs
by: Cai, Zhaolin, et al.
Published: (2025)
by: Cai, Zhaolin, et al.
Published: (2025)
HeadHunt-VAD: Hunting Robust Anomaly-Sensitive Heads in MLLM for Tuning-Free Video Anomaly Detection
by: Cai, Zhaolin, et al.
Published: (2025)
by: Cai, Zhaolin, et al.
Published: (2025)
MMHead: Towards Fine-grained Multi-modal 3D Facial Animation
by: Wu, Sijing, et al.
Published: (2024)
by: Wu, Sijing, et al.
Published: (2024)
ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation
by: Xu, Zitong, et al.
Published: (2025)
by: Xu, Zitong, et al.
Published: (2025)
HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs
by: Cai, Zhaolin, et al.
Published: (2025)
by: Cai, Zhaolin, et al.
Published: (2025)
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM
by: Wang, Jiarui, et al.
Published: (2024)
by: Wang, Jiarui, et al.
Published: (2024)
TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs
by: Wang, Juntong, et al.
Published: (2025)
by: Wang, Juntong, et al.
Published: (2025)
MVBIND: Self-Supervised Music Recommendation For Videos Via Embedding Space Binding
by: Teng, Jiajie, et al.
Published: (2024)
by: Teng, Jiajie, et al.
Published: (2024)
How is Visual Attention Influenced by Text Guidance? Database and Model
by: Sun, Yinan, et al.
Published: (2024)
by: Sun, Yinan, et al.
Published: (2024)
Quality Assessment for AI Generated Images with Instruction Tuning
by: Wang, Jiarui, et al.
Published: (2024)
by: Wang, Jiarui, et al.
Published: (2024)
Facial Attractiveness Prediction in Live Streaming: A New Benchmark and Multi-modal Method
by: Li, Hui, et al.
Published: (2025)
by: Li, Hui, et al.
Published: (2025)
DPC-VQA: Decoupling Quality Perception and Residual Calibration for Video Quality Assessment
by: Li, Xinyue, et al.
Published: (2026)
by: Li, Xinyue, et al.
Published: (2026)
Perceptual Video Quality Assessment: A Survey
by: Min, Xiongkuo, et al.
Published: (2024)
by: Min, Xiongkuo, et al.
Published: (2024)
FVQ: A Large-Scale Dataset and an LMM-based Method for Face Video Quality Assessment
by: Wu, Sijing, et al.
Published: (2025)
by: Wu, Sijing, et al.
Published: (2025)
VideoAesBench: Benchmarking the Video Aesthetics Perception Capabilities of Large Multimodal Models
by: Li, Yunhao, et al.
Published: (2026)
by: Li, Yunhao, et al.
Published: (2026)
UniProcessor: A Text-induced Unified Low-level Image Processor
by: Duan, Huiyu, et al.
Published: (2024)
by: Duan, Huiyu, et al.
Published: (2024)
VTONQA: A Multi-Dimensional Quality Assessment Dataset for Virtual Try-on
by: Wei, Xinyi, et al.
Published: (2026)
by: Wei, Xinyi, et al.
Published: (2026)
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
by: Song, Xiufeng, et al.
Published: (2024)
by: Song, Xiufeng, et al.
Published: (2024)
DHQA-4D: Perceptual Quality Assessment of Dynamic 4D Digital Human
by: Li, Yunhao, et al.
Published: (2025)
by: Li, Yunhao, et al.
Published: (2025)
Exploring Instruction Data Quality for Explainable Image Quality Assessment
by: Li, Yunhao, et al.
Published: (2025)
by: Li, Yunhao, et al.
Published: (2025)
DynT2I-Eval: A Dynamic Evaluation Framework for Text-to-Image Models
by: Wang, Juntong, et al.
Published: (2026)
by: Wang, Juntong, et al.
Published: (2026)
Multi-Dimensional Quality Assessment for Text-to-3D Assets: Dataset and Model
by: Fu, Kang, et al.
Published: (2025)
by: Fu, Kang, et al.
Published: (2025)
ESVQA: Perceptual Quality Assessment of Egocentric Spatial Videos
by: Zhu, Xilei, et al.
Published: (2024)
by: Zhu, Xilei, et al.
Published: (2024)
EEmo-Logic: A Unified Dataset and Multi-Stage Framework for Comprehensive Image-Evoked Emotion Assessment
by: Gao, Lancheng, et al.
Published: (2026)
by: Gao, Lancheng, et al.
Published: (2026)
How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model
by: Zhu, Yuxin, et al.
Published: (2024)
by: Zhu, Yuxin, et al.
Published: (2024)
ReasonEdit: Towards Interpretable Image Editing Evaluation via Reinforcement Learning
by: Chen, Honghua, et al.
Published: (2026)
by: Chen, Honghua, et al.
Published: (2026)
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs
by: Wang, Jiarui, et al.
Published: (2025)
by: Wang, Jiarui, et al.
Published: (2025)
TIT-Score: Evaluating Long-Prompt Based Text-to-Image Alignment via Text-to-Image-to-Text Consistency
by: Wang, Juntong, et al.
Published: (2025)
by: Wang, Juntong, et al.
Published: (2025)
I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing Models
by: Wang, Juntong, et al.
Published: (2025)
by: Wang, Juntong, et al.
Published: (2025)
RGC-VQA: An Exploration Database for Robotic-Generated Video Quality Assessment
by: Jin, Jianing, et al.
Published: (2025)
by: Jin, Jianing, et al.
Published: (2025)
Robust Mesh Saliency Ground Truth Acquisition in VR via View Cone Sampling and Manifold Diffusion
by: Zheng, Guoquan, et al.
Published: (2026)
by: Zheng, Guoquan, et al.
Published: (2026)
ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images
by: Zhu, Xilei, et al.
Published: (2024)
by: Zhu, Xilei, et al.
Published: (2024)
Preference-Guided Debiasing for No-Reference Enhancement Image Quality Assessment
by: Gao, Shiqi, et al.
Published: (2026)
by: Gao, Shiqi, et al.
Published: (2026)
Q-Bench-Portrait: Benchmarking Multimodal Large Language Models on Portrait Image Quality Perception
by: Wu, Sijing, et al.
Published: (2026)
by: Wu, Sijing, et al.
Published: (2026)
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation
by: Wang, Jiarui, et al.
Published: (2025)
by: Wang, Jiarui, et al.
Published: (2025)
ResAD++: Towards Class Agnostic Anomaly Detection via Residual Feature Learning
by: Yao, Xincheng, et al.
Published: (2025)
by: Yao, Xincheng, et al.
Published: (2025)
F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration
by: Liu, Lu, et al.
Published: (2024)
by: Liu, Lu, et al.
Published: (2024)
Agentic Retoucher for Text-To-Image Generation
by: Shen, Shaocheng, et al.
Published: (2026)
by: Shen, Shaocheng, et al.
Published: (2026)
Embodied Image Quality Assessment for Robotic Intelligence
by: Zhang, Jianbo, et al.
Published: (2024)
by: Zhang, Jianbo, et al.
Published: (2024)
MoA-VR: A Mixture-of-Agents System Towards All-in-One Video Restoration
by: Liu, Lu, et al.
Published: (2025)
by: Liu, Lu, et al.
Published: (2025)
Similar Items
-
Generative Human-Object Interaction Detection via Differentiable Cognitive Steering of Multi-modal LLMs
by: Cai, Zhaolin, et al.
Published: (2025) -
HeadHunt-VAD: Hunting Robust Anomaly-Sensitive Heads in MLLM for Tuning-Free Video Anomaly Detection
by: Cai, Zhaolin, et al.
Published: (2025) -
MMHead: Towards Fine-grained Multi-modal 3D Facial Animation
by: Wu, Sijing, et al.
Published: (2024) -
ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation
by: Xu, Zitong, et al.
Published: (2025) -
HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs
by: Cai, Zhaolin, et al.
Published: (2025)