Saved in:
| Main Authors: | Lee, Harin, Van Geert, Eline, Celen, Elif, Marjieh, Raja, van Rijn, Pol, Park, Minsu, Jacoby, Nori |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.14439 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Are Expressions for Music Emotions the Same Across Cultures?
by: Celen, Elif, et al.
Published: (2025)
by: Celen, Elif, et al.
Published: (2025)
GlobalMood: A cross-cultural benchmark for music emotion recognition
by: Lee, Harin, et al.
Published: (2025)
by: Lee, Harin, et al.
Published: (2025)
Characterizing the Large‐Scale Structure of Multimodal Semantic Networks
by: Raja Marjieh, et al.
Published: (2025)
by: Raja Marjieh, et al.
Published: (2025)
A Rational Analysis of the Speech-to-Song Illusion
by: Marjieh, Raja, et al.
Published: (2024)
by: Marjieh, Raja, et al.
Published: (2024)
Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People
by: Huang, Dun-Ming, et al.
Published: (2024)
by: Huang, Dun-Ming, et al.
Published: (2024)
An Experimental Method to Study Opinion Diffusion in Human-AI Hybrid Societies
by: Gaubert, Léna, et al.
Published: (2026)
by: Gaubert, Léna, et al.
Published: (2026)
Solving Copyright Infringement on Short Video Platforms: Novel Datasets and an Audio Restoration Deep Learning Pipeline
by: Oh, Minwoo, et al.
Published: (2025)
by: Oh, Minwoo, et al.
Published: (2025)
Characterizing the Interaction of Cultural Evolution Mechanisms in Experimental Social Networks
by: Marjieh, Raja, et al.
Published: (2025)
by: Marjieh, Raja, et al.
Published: (2025)
BioNet-XR: Biological Network Visualization Framework for Virtual Reality and Mixed Reality Environments
by: Senderin, Busra, et al.
Published: (2024)
by: Senderin, Busra, et al.
Published: (2024)
EmoVLM-KD: Fusing Distilled Expertise with Vision-Language Models for Visual Emotion Analysis
by: Lee, SangEun, et al.
Published: (2025)
by: Lee, SangEun, et al.
Published: (2025)
CLIP Brings Better Features to Visual Aesthetics Learners
by: Xu, Liwu, et al.
Published: (2023)
by: Xu, Liwu, et al.
Published: (2023)
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
by: Choi, Jeongsoo, et al.
Published: (2023)
by: Choi, Jeongsoo, et al.
Published: (2023)
The Dynamics of Collective Creativity in Human-AI Hybrid Societies
by: Shiiku, Shota, et al.
Published: (2025)
by: Shiiku, Shota, et al.
Published: (2025)
Do Melody and Rhythm Coevolve?
by: Lee, Harin, et al.
Published: (2026)
by: Lee, Harin, et al.
Published: (2026)
Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example
by: Zhou, Aven-Le, et al.
Published: (2024)
by: Zhou, Aven-Le, et al.
Published: (2024)
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs
by: Cappellazzo, Umberto, et al.
Published: (2025)
by: Cappellazzo, Umberto, et al.
Published: (2025)
Design-MLLM: A Reinforcement Alignment Framework for Verifiable and Aesthetic Interior Design
by: Yang, Yuxuan, et al.
Published: (2026)
by: Yang, Yuxuan, et al.
Published: (2026)
TAGF: Time-aware Gated Fusion for Multimodal Valence-Arousal Estimation
by: Lee, Yubeen, et al.
Published: (2025)
by: Lee, Yubeen, et al.
Published: (2025)
Inter-Frame Compression for Dynamic Point Cloud Geometry Coding
by: Akhtar, Anique, et al.
Published: (2022)
by: Akhtar, Anique, et al.
Published: (2022)
ScaleTrotter: Illustrative Visual Travels Across Negative Scales
by: Halladjian, Sarkis, et al.
Published: (2019)
by: Halladjian, Sarkis, et al.
Published: (2019)
Song Aesthetics Evaluation with Multi-Stem Attention and Hierarchical Uncertainty Modeling
by: Lv, Yishan, et al.
Published: (2026)
by: Lv, Yishan, et al.
Published: (2026)
The Rhythm of Tai Chi: Revitalizing Cultural Heritage in Virtual Reality through Interactive Visuals
by: Wang, Xianghan
Published: (2025)
by: Wang, Xianghan
Published: (2025)
Creating Aesthetic Sonifications on the Web with SIREN
by: Peng, Tristan, et al.
Published: (2024)
by: Peng, Tristan, et al.
Published: (2024)
CDIO: Cross-Domain Inference Optimization with Resource Preference Prediction for Edge-Cloud Collaboration
by: Yang, Zheming, et al.
Published: (2025)
by: Yang, Zheming, et al.
Published: (2025)
Delayed Commitment for Representation Readiness in Stage-wise Audio-Visual Learning
by: Xu, Xinmeng, et al.
Published: (2026)
by: Xu, Xinmeng, et al.
Published: (2026)
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
by: Yeo, Jeong Hun, et al.
Published: (2025)
by: Yeo, Jeong Hun, et al.
Published: (2025)
VSpeechLM: A Visual Speech Language Model for Visual Text-to-Speech Task
by: Wang, Yuyue, et al.
Published: (2025)
by: Wang, Yuyue, et al.
Published: (2025)
Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality
by: Park, Kyu Ri, et al.
Published: (2024)
by: Park, Kyu Ri, et al.
Published: (2024)
Giving Robots a Voice: Human-in-the-Loop Voice Creation and open-ended Labeling
by: van Rijn, Pol, et al.
Published: (2024)
by: van Rijn, Pol, et al.
Published: (2024)
Large Language Models are Strong Audio-Visual Speech Recognition Learners
by: Cappellazzo, Umberto, et al.
Published: (2024)
by: Cappellazzo, Umberto, et al.
Published: (2024)
Stage-Adaptive Reliability Modeling for Continuous Valence-Arousal Estimation
by: Lee, Yubeen, et al.
Published: (2026)
by: Lee, Yubeen, et al.
Published: (2026)
Aesthetics 3D geovisualization for flood disaster based on the XYZ coordinate
by: M.Y. Rezaldi
Published: (2023)
by: M.Y. Rezaldi
Published: (2023)
MeMo: Attentional Momentum for Real-time Audio-visual Speaker Extraction under Impaired Visual Conditions
by: Li, Junjie, et al.
Published: (2025)
by: Li, Junjie, et al.
Published: (2025)
EMID: An Emotional Aligned Dataset in Audio-Visual Modality
by: Zou, Jialing, et al.
Published: (2023)
by: Zou, Jialing, et al.
Published: (2023)
A Progressive Evaluation Framework for Multicultural Analysis of Story Visualization
by: Kapuriya, Janak, et al.
Published: (2025)
by: Kapuriya, Janak, et al.
Published: (2025)
Enhancing Video Music Recommendation with Transformer-Driven Audio-Visual Embeddings
by: Liu, Shimiao, et al.
Published: (2025)
by: Liu, Shimiao, et al.
Published: (2025)
MarkIt: Training-Free Visual Markers for Precise Video Temporal Grounding
by: Fang, Pengcheng, et al.
Published: (2026)
by: Fang, Pengcheng, et al.
Published: (2026)
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
by: Yeo, Jeong Hun, et al.
Published: (2023)
by: Yeo, Jeong Hun, et al.
Published: (2023)
Designing a Multimodal Viewer for Piano Performance Analysis -- a Pedagogy-First Approach
by: Bae, Joonhyung, et al.
Published: (2025)
by: Bae, Joonhyung, et al.
Published: (2025)
OmniDPO: A Preference Optimization Framework to Address Omni-Modal Hallucination
by: Chen, Junzhe, et al.
Published: (2025)
by: Chen, Junzhe, et al.
Published: (2025)
Similar Items
-
Are Expressions for Music Emotions the Same Across Cultures?
by: Celen, Elif, et al.
Published: (2025) -
GlobalMood: A cross-cultural benchmark for music emotion recognition
by: Lee, Harin, et al.
Published: (2025) -
Characterizing the Large‐Scale Structure of Multimodal Semantic Networks
by: Raja Marjieh, et al.
Published: (2025) -
A Rational Analysis of the Speech-to-Song Illusion
by: Marjieh, Raja, et al.
Published: (2024) -
Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People
by: Huang, Dun-Ming, et al.
Published: (2024)