:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xu, Junjie, Wu, Xingjiao, Yao, Tanren, Zhang, Zihao, Bei, Jiayang, Wen, Wu, He, Liang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2501.01700
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

An Order-Complexity Aesthetic Assessment Model for Aesthetic-aware Music Recommendation
by: Jin, Xin, et al.
Published: (2024)

EmoStyle: Emotion-Driven Image Stylization
by: Yang, Jingyuan, et al.
Published: (2025)

Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings
by: Hisariya, Tanisha, et al.
Published: (2024)

Emotion-Guided Image to Music Generation
by: Kundu, Souraja, et al.
Published: (2024)

CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
by: Huai, Tianyu, et al.
Published: (2025)

Emotion Detection and Music Recommendation System
by: Kambham, Swetha, et al.
Published: (2025)

Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models
by: Yin, Jianghao, et al.
Published: (2026)

Fine-Grained Scene Image Classification with Modality-Agnostic Adapter
by: Wang, Yiqun, et al.
Published: (2024)

Music Recommendation Based on Facial Emotion Recognition
by: B, Rajesh, et al.
Published: (2024)

APEX: Learning Adaptive Priorities for Multi-Objective Alignment in Vision-Language Generation
by: Chen, Dongliang, et al.
Published: (2026)

Learning Musical Representations for Music Performance Question Answering
by: Diao, Xingjian, et al.
Published: (2025)

UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
by: Cao, Shuo, et al.
Published: (2025)

ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding
by: Cao, Shuo, et al.
Published: (2025)

Extending Visual Dynamics for Video-to-Music Generation
by: Liu, Xiaohao, et al.
Published: (2025)

Cross-Domain Document Layout Analysis Using Document Style Guide
by: Wu, Xingjiao, et al.
Published: (2022)

MVPBench: A Multi-Video Perception Evaluation Benchmark for Multi-Modal Video Understanding
by: Bai, Purui, et al.
Published: (2026)

YingVideo-MV: Music-Driven Multi-Stage Video Generation
by: Chen, Jiahui, et al.
Published: (2025)

Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment
by: Hong, Jiaying, et al.
Published: (2025)

EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba
by: Nguyen, Quang, et al.
Published: (2025)

SITA: Structurally Imperceptible and Transferable Adversarial Attacks for Stylized Image Generation
by: Kang, Jingdan, et al.
Published: (2025)

StylizedGS: Controllable Stylization for 3D Gaussian Splatting
by: Zhang, Dingxi, et al.
Published: (2024)

PointNet4D: A Lightweight 4D Point Cloud Video Backbone for Online and Offline Perception in Robotic Applications
by: Liu, Yunze, et al.
Published: (2025)

Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition
by: Yin, Wen, et al.
Published: (2025)

Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization
by: Yang, Tao, et al.
Published: (2023)

AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception
by: Huang, Yipo, et al.
Published: (2024)

Discovering "Words" in Music: Unsupervised Learning of Compositional Sparse Code for Symbolic Music
by: Wang, Tianle, et al.
Published: (2025)

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
by: Qi, Tianhao, et al.
Published: (2024)

End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music
by: Ríos-Vila, Antonio, et al.
Published: (2024)

Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
by: Peng, Ruotian, et al.
Published: (2025)

Music Audio-Visual Question Answering Requires Specialized Multimodal Designs
by: You, Wenhao, et al.
Published: (2025)

MPJudge: Towards Perceptual Assessment of Music-Induced Paintings
by: Jiang, Shiqi, et al.
Published: (2025)

Enhancing Image Aesthetics with Dual-Conditioned Diffusion Models Guided by Multimodal Perception
by: Nan, Xinyu, et al.
Published: (2026)

AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception
by: Huang, Yipo, et al.
Published: (2024)

ActiveUMI: Robotic Manipulation with Active Perception from Robot-Free Human Demonstrations
by: Zeng, Qiyuan, et al.
Published: (2025)

MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation
by: Yang, Kaixing, et al.
Published: (2025)

VideoAesBench: Benchmarking the Video Aesthetics Perception Capabilities of Large Multimodal Models
by: Li, Yunhao, et al.
Published: (2026)

MusiXQA: Advancing Visual Music Understanding in Multimodal Large Language Models
by: Chen, Jian, et al.
Published: (2025)

Mixture of Style Experts for Diverse Image Stylization
by: Zhu, Shihao, et al.
Published: (2026)

AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation
by: Li, Hao, et al.
Published: (2025)

Synthetic Perception: Can Generated Images Unlock Latent Visual Prior for Text-Centric Reasoning?
by: Huang, Yuesheng, et al.
Published: (2025)