:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Yang, Xiaoda, Zhang, Majun, Pan, Changhao, Huang, Nick, Yuguang, Yang, Zhuo, Fan, Zhou, Pengfei, Zhou, Jin, Shan, Sizhe, Yang, Shan, Yang, Miles, You, Yang, Zhao, Zhou
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Sound Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2605.01809
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation
von: Shan, Sizhe, et al.
Veröffentlicht: (2025)

TokenDance: Token-to-Token Music-to-Dance Generation with Bidirectional Mamba
von: Yang, Ziyue, et al.
Veröffentlicht: (2026)

ImVideoEdit: Image-learning Video Editing via 2D Spatial Difference Attention Blocks
von: Xu, Jiayang, et al.
Veröffentlicht: (2026)

Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
von: Chen, Peikun, et al.
Veröffentlicht: (2024)

SyncTrack: Rhythmic Stability and Synchronization in Multi-Track Music Generation
von: Wang, Hongrui, et al.
Veröffentlicht: (2026)

DanceChat: Large Language Model-Guided Music-to-Dance Generation
von: Wang, Qing, et al.
Veröffentlicht: (2025)

Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis
von: Hu, Xintong, et al.
Veröffentlicht: (2025)

CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition
von: Yang, Kaixing, et al.
Veröffentlicht: (2024)

Beyond the Mouth: Upper-Face Affective Cues in Audiovisual Sentence Recognition under Acoustic Uncertainty
von: Yang, Zhou, et al.
Veröffentlicht: (2026)

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
von: Yao, Jixun, et al.
Veröffentlicht: (2024)

Exploring Multi-Modal Control in Music-Driven Dance Generation
von: Li, Ronghui, et al.
Veröffentlicht: (2024)

Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech
von: Yao, Jixun, et al.
Veröffentlicht: (2025)

Musical Score Understanding Benchmark: Evaluating Large Language Models' Comprehension of Complete Musical Scores
von: Dai, Congren, et al.
Veröffentlicht: (2025)

EnchantDance: Unveiling the Potential of Music-Driven Dance Movement
von: Han, Bo, et al.
Veröffentlicht: (2023)

PSCodec: A Series of High-Fidelity Low-bitrate Neural Speech Codecs Leveraging Prompt Encoders
von: Pan, Yu, et al.
Veröffentlicht: (2024)

Tempo as the Stable Cue: Hierarchical Mixture of Tempo and Beat Experts for Music to 3D Dance Generation
von: Lyu, Guangtao, et al.
Veröffentlicht: (2025)

AudioCapBench: Quick Evaluation on Audio Captioning across Sound, Music, and Speech
von: Qiu, Jielin, et al.
Veröffentlicht: (2026)

Dance-to-Music Generation with Encoder-based Textual Inversion
von: Li, Sifei, et al.
Veröffentlicht: (2024)

Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
von: Li, Xiaojie, et al.
Veröffentlicht: (2025)

Acoustic Overspecification in Electronic Dance Music Taxonomy
von: Xu, Weilun, et al.
Veröffentlicht: (2025)

AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
von: Yang, Qian, et al.
Veröffentlicht: (2024)

Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval
von: Zhou, Lifeng, et al.
Veröffentlicht: (2024)

ClapFM-EVC: High-Fidelity and Flexible Emotional Voice Conversion with Dual Control from Natural Language and Speech
von: Pan, Yu, et al.
Veröffentlicht: (2025)

Aligning Language Models for Lyric-to-Melody Generation with Rule-Based Musical Constraints
von: Meng, Hao, et al.
Veröffentlicht: (2026)

FoleyDirector: Fine-Grained Temporal Steering for Video-to-Audio Generation via Structured Scripts
von: Li, You, et al.
Veröffentlicht: (2026)

Zero-Shot Voice Conversion via Content-Aware Timbre Ensemble and Conditional Flow Matching
von: Pan, Yu, et al.
Veröffentlicht: (2024)

GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning for Speech Emotion Recognition
von: Pan, Yu, et al.
Veröffentlicht: (2024)

ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
von: Zhang, Yu, et al.
Veröffentlicht: (2025)

In This Environment, As That Speaker: A Text-Driven Framework for Multi-Attribute Speech Conversion
von: Jin, Jiawei, et al.
Veröffentlicht: (2025)

ASAudio: A Survey of Advanced Spatial Audio Research
von: Zhu, Zhiyuan, et al.
Veröffentlicht: (2025)

GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
von: Zhang, Yu, et al.
Veröffentlicht: (2024)

S2ST-Omni: Hierarchical Language-Aware SpeechLLM Adaptation for Multilingual Speech-to-Speech Translation
von: Pan, Yu, et al.
Veröffentlicht: (2025)

Music2Fail: Transfer Music to Failed Recorder Style
von: Leong, Chon In, et al.
Veröffentlicht: (2024)

From Speech to Profile: A Protocol-Driven LLM Agent for Psychological Profile Generation
von: Yang, Xingjian, et al.
Veröffentlicht: (2026)

DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions
von: Chen, Weidong, et al.
Veröffentlicht: (2025)

Takin-VC: Expressive Zero-Shot Voice Conversion via Adaptive Hybrid Content Encoding and Enhanced Timbre Modeling
von: Yang, Yuguang, et al.
Veröffentlicht: (2024)

Detecting Musical Deepfakes
von: Sunday, Nick
Veröffentlicht: (2025)

Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
von: Zuo, Jialong, et al.
Veröffentlicht: (2025)

The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models
von: Li, Jiajia, et al.
Veröffentlicht: (2024)

MusER: Musical Element-Based Regularization for Generating Symbolic Music with Emotion
von: Ji, Shulei, et al.
Veröffentlicht: (2023)