Saved in:
| Main Authors: | Xu, Junjie, Wu, Xingjiao, Yao, Tanren, Zhang, Zihao, Bei, Jiayang, Wen, Wu, He, Liang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.01700 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
An Order-Complexity Aesthetic Assessment Model for Aesthetic-aware Music Recommendation
by: Jin, Xin, et al.
Published: (2024)
by: Jin, Xin, et al.
Published: (2024)
EmoStyle: Emotion-Driven Image Stylization
by: Yang, Jingyuan, et al.
Published: (2025)
by: Yang, Jingyuan, et al.
Published: (2025)
Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings
by: Hisariya, Tanisha, et al.
Published: (2024)
by: Hisariya, Tanisha, et al.
Published: (2024)
Emotion-Guided Image to Music Generation
by: Kundu, Souraja, et al.
Published: (2024)
by: Kundu, Souraja, et al.
Published: (2024)
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
by: Huai, Tianyu, et al.
Published: (2025)
by: Huai, Tianyu, et al.
Published: (2025)
Emotion Detection and Music Recommendation System
by: Kambham, Swetha, et al.
Published: (2025)
by: Kambham, Swetha, et al.
Published: (2025)
Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models
by: Yin, Jianghao, et al.
Published: (2026)
by: Yin, Jianghao, et al.
Published: (2026)
Fine-Grained Scene Image Classification with Modality-Agnostic Adapter
by: Wang, Yiqun, et al.
Published: (2024)
by: Wang, Yiqun, et al.
Published: (2024)
Music Recommendation Based on Facial Emotion Recognition
by: B, Rajesh, et al.
Published: (2024)
by: B, Rajesh, et al.
Published: (2024)
APEX: Learning Adaptive Priorities for Multi-Objective Alignment in Vision-Language Generation
by: Chen, Dongliang, et al.
Published: (2026)
by: Chen, Dongliang, et al.
Published: (2026)
Learning Musical Representations for Music Performance Question Answering
by: Diao, Xingjian, et al.
Published: (2025)
by: Diao, Xingjian, et al.
Published: (2025)
UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
by: Cao, Shuo, et al.
Published: (2025)
by: Cao, Shuo, et al.
Published: (2025)
ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding
by: Cao, Shuo, et al.
Published: (2025)
by: Cao, Shuo, et al.
Published: (2025)
Extending Visual Dynamics for Video-to-Music Generation
by: Liu, Xiaohao, et al.
Published: (2025)
by: Liu, Xiaohao, et al.
Published: (2025)
Cross-Domain Document Layout Analysis Using Document Style Guide
by: Wu, Xingjiao, et al.
Published: (2022)
by: Wu, Xingjiao, et al.
Published: (2022)
MVPBench: A Multi-Video Perception Evaluation Benchmark for Multi-Modal Video Understanding
by: Bai, Purui, et al.
Published: (2026)
by: Bai, Purui, et al.
Published: (2026)
YingVideo-MV: Music-Driven Multi-Stage Video Generation
by: Chen, Jiahui, et al.
Published: (2025)
by: Chen, Jiahui, et al.
Published: (2025)
Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment
by: Hong, Jiaying, et al.
Published: (2025)
by: Hong, Jiaying, et al.
Published: (2025)
EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba
by: Nguyen, Quang, et al.
Published: (2025)
by: Nguyen, Quang, et al.
Published: (2025)
SITA: Structurally Imperceptible and Transferable Adversarial Attacks for Stylized Image Generation
by: Kang, Jingdan, et al.
Published: (2025)
by: Kang, Jingdan, et al.
Published: (2025)
StylizedGS: Controllable Stylization for 3D Gaussian Splatting
by: Zhang, Dingxi, et al.
Published: (2024)
by: Zhang, Dingxi, et al.
Published: (2024)
PointNet4D: A Lightweight 4D Point Cloud Video Backbone for Online and Offline Perception in Robotic Applications
by: Liu, Yunze, et al.
Published: (2025)
by: Liu, Yunze, et al.
Published: (2025)
Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition
by: Yin, Wen, et al.
Published: (2025)
by: Yin, Wen, et al.
Published: (2025)
Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization
by: Yang, Tao, et al.
Published: (2023)
by: Yang, Tao, et al.
Published: (2023)
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception
by: Huang, Yipo, et al.
Published: (2024)
by: Huang, Yipo, et al.
Published: (2024)
Discovering "Words" in Music: Unsupervised Learning of Compositional Sparse Code for Symbolic Music
by: Wang, Tianle, et al.
Published: (2025)
by: Wang, Tianle, et al.
Published: (2025)
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
by: Qi, Tianhao, et al.
Published: (2024)
by: Qi, Tianhao, et al.
Published: (2024)
End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music
by: Ríos-Vila, Antonio, et al.
Published: (2024)
by: Ríos-Vila, Antonio, et al.
Published: (2024)
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
by: Peng, Ruotian, et al.
Published: (2025)
by: Peng, Ruotian, et al.
Published: (2025)
Music Audio-Visual Question Answering Requires Specialized Multimodal Designs
by: You, Wenhao, et al.
Published: (2025)
by: You, Wenhao, et al.
Published: (2025)
MPJudge: Towards Perceptual Assessment of Music-Induced Paintings
by: Jiang, Shiqi, et al.
Published: (2025)
by: Jiang, Shiqi, et al.
Published: (2025)
Enhancing Image Aesthetics with Dual-Conditioned Diffusion Models Guided by Multimodal Perception
by: Nan, Xinyu, et al.
Published: (2026)
by: Nan, Xinyu, et al.
Published: (2026)
AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception
by: Huang, Yipo, et al.
Published: (2024)
by: Huang, Yipo, et al.
Published: (2024)
ActiveUMI: Robotic Manipulation with Active Perception from Robot-Free Human Demonstrations
by: Zeng, Qiyuan, et al.
Published: (2025)
by: Zeng, Qiyuan, et al.
Published: (2025)
MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation
by: Yang, Kaixing, et al.
Published: (2025)
by: Yang, Kaixing, et al.
Published: (2025)
VideoAesBench: Benchmarking the Video Aesthetics Perception Capabilities of Large Multimodal Models
by: Li, Yunhao, et al.
Published: (2026)
by: Li, Yunhao, et al.
Published: (2026)
MusiXQA: Advancing Visual Music Understanding in Multimodal Large Language Models
by: Chen, Jian, et al.
Published: (2025)
by: Chen, Jian, et al.
Published: (2025)
Mixture of Style Experts for Diverse Image Stylization
by: Zhu, Shihao, et al.
Published: (2026)
by: Zhu, Shihao, et al.
Published: (2026)
AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation
by: Li, Hao, et al.
Published: (2025)
by: Li, Hao, et al.
Published: (2025)
Synthetic Perception: Can Generated Images Unlock Latent Visual Prior for Text-Centric Reasoning?
by: Huang, Yuesheng, et al.
Published: (2025)
by: Huang, Yuesheng, et al.
Published: (2025)
Similar Items
-
An Order-Complexity Aesthetic Assessment Model for Aesthetic-aware Music Recommendation
by: Jin, Xin, et al.
Published: (2024) -
EmoStyle: Emotion-Driven Image Stylization
by: Yang, Jingyuan, et al.
Published: (2025) -
Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings
by: Hisariya, Tanisha, et al.
Published: (2024) -
Emotion-Guided Image to Music Generation
by: Kundu, Souraja, et al.
Published: (2024) -
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
by: Huai, Tianyu, et al.
Published: (2025)