:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lee, Harin, Van Geert, Eline, Celen, Elif, Marjieh, Raja, van Rijn, Pol, Park, Minsu, Jacoby, Nori
Format:	Preprint
Published:	2025
Subjects:	Multimedia
Online Access:	https://arxiv.org/abs/2502.14439
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Are Expressions for Music Emotions the Same Across Cultures?
by: Celen, Elif, et al.
Published: (2025)

GlobalMood: A cross-cultural benchmark for music emotion recognition
by: Lee, Harin, et al.
Published: (2025)

Characterizing the Large‐Scale Structure of Multimodal Semantic Networks
by: Raja Marjieh, et al.
Published: (2025)

A Rational Analysis of the Speech-to-Song Illusion
by: Marjieh, Raja, et al.
Published: (2024)

Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People
by: Huang, Dun-Ming, et al.
Published: (2024)

An Experimental Method to Study Opinion Diffusion in Human-AI Hybrid Societies
by: Gaubert, Léna, et al.
Published: (2026)

Solving Copyright Infringement on Short Video Platforms: Novel Datasets and an Audio Restoration Deep Learning Pipeline
by: Oh, Minwoo, et al.
Published: (2025)

Characterizing the Interaction of Cultural Evolution Mechanisms in Experimental Social Networks
by: Marjieh, Raja, et al.
Published: (2025)

BioNet-XR: Biological Network Visualization Framework for Virtual Reality and Mixed Reality Environments
by: Senderin, Busra, et al.
Published: (2024)

EmoVLM-KD: Fusing Distilled Expertise with Vision-Language Models for Visual Emotion Analysis
by: Lee, SangEun, et al.
Published: (2025)

CLIP Brings Better Features to Visual Aesthetics Learners
by: Xu, Liwu, et al.
Published: (2023)

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
by: Choi, Jeongsoo, et al.
Published: (2023)

The Dynamics of Collective Creativity in Human-AI Hybrid Societies
by: Shiiku, Shota, et al.
Published: (2025)

Do Melody and Rhythm Coevolve?
by: Lee, Harin, et al.
Published: (2026)

Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example
by: Zhou, Aven-Le, et al.
Published: (2024)

Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs
by: Cappellazzo, Umberto, et al.
Published: (2025)

Design-MLLM: A Reinforcement Alignment Framework for Verifiable and Aesthetic Interior Design
by: Yang, Yuxuan, et al.
Published: (2026)

TAGF: Time-aware Gated Fusion for Multimodal Valence-Arousal Estimation
by: Lee, Yubeen, et al.
Published: (2025)

Inter-Frame Compression for Dynamic Point Cloud Geometry Coding
by: Akhtar, Anique, et al.
Published: (2022)

ScaleTrotter: Illustrative Visual Travels Across Negative Scales
by: Halladjian, Sarkis, et al.
Published: (2019)

Song Aesthetics Evaluation with Multi-Stem Attention and Hierarchical Uncertainty Modeling
by: Lv, Yishan, et al.
Published: (2026)

The Rhythm of Tai Chi: Revitalizing Cultural Heritage in Virtual Reality through Interactive Visuals
by: Wang, Xianghan
Published: (2025)

Creating Aesthetic Sonifications on the Web with SIREN
by: Peng, Tristan, et al.
Published: (2024)

CDIO: Cross-Domain Inference Optimization with Resource Preference Prediction for Edge-Cloud Collaboration
by: Yang, Zheming, et al.
Published: (2025)

Delayed Commitment for Representation Readiness in Stage-wise Audio-Visual Learning
by: Xu, Xinmeng, et al.
Published: (2026)

Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
by: Yeo, Jeong Hun, et al.
Published: (2025)

VSpeechLM: A Visual Speech Language Model for Visual Text-to-Speech Task
by: Wang, Yuyue, et al.
Published: (2025)

Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality
by: Park, Kyu Ri, et al.
Published: (2024)

Giving Robots a Voice: Human-in-the-Loop Voice Creation and open-ended Labeling
by: van Rijn, Pol, et al.
Published: (2024)

Large Language Models are Strong Audio-Visual Speech Recognition Learners
by: Cappellazzo, Umberto, et al.
Published: (2024)

Stage-Adaptive Reliability Modeling for Continuous Valence-Arousal Estimation
by: Lee, Yubeen, et al.
Published: (2026)

Aesthetics 3D geovisualization for flood disaster based on the XYZ coordinate
by: M.Y. Rezaldi
Published: (2023)

MeMo: Attentional Momentum for Real-time Audio-visual Speaker Extraction under Impaired Visual Conditions
by: Li, Junjie, et al.
Published: (2025)

EMID: An Emotional Aligned Dataset in Audio-Visual Modality
by: Zou, Jialing, et al.
Published: (2023)

A Progressive Evaluation Framework for Multicultural Analysis of Story Visualization
by: Kapuriya, Janak, et al.
Published: (2025)

Enhancing Video Music Recommendation with Transformer-Driven Audio-Visual Embeddings
by: Liu, Shimiao, et al.
Published: (2025)

MarkIt: Training-Free Visual Markers for Precise Video Temporal Grounding
by: Fang, Pengcheng, et al.
Published: (2026)

AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
by: Yeo, Jeong Hun, et al.
Published: (2023)

Designing a Multimodal Viewer for Piano Performance Analysis -- a Pedagogy-First Approach
by: Bae, Joonhyung, et al.
Published: (2025)

OmniDPO: A Preference Optimization Framework to Address Omni-Modal Hallucination
by: Chen, Junzhe, et al.
Published: (2025)