Saved in:
| Main Authors: | Yang, Kaixing, Tang, Xulong, Peng, Ziqiao, Hu, Yuxuan, He, Jun, Liu, Hongyan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.17543 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MATHDance: Mamba-Transformer Architecture with Uniform Tokenization for High-Quality 3D Dance Generation
by: Yang, Kaixing, et al.
Published: (2025)
by: Yang, Kaixing, et al.
Published: (2025)
CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition
by: Yang, Kaixing, et al.
Published: (2024)
by: Yang, Kaixing, et al.
Published: (2024)
GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment
by: Wang, Jinting, et al.
Published: (2025)
by: Wang, Jinting, et al.
Published: (2025)
Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
by: Li, Xiaojie, et al.
Published: (2025)
by: Li, Xiaojie, et al.
Published: (2025)
Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information
by: Huang, Qiaochu, et al.
Published: (2024)
by: Huang, Qiaochu, et al.
Published: (2024)
Dance-to-Music Generation with Encoder-based Textual Inversion
by: Li, Sifei, et al.
Published: (2024)
by: Li, Sifei, et al.
Published: (2024)
Dance2MIDI: Dance-driven multi-instruments music generation
by: Han, Bo, et al.
Published: (2023)
by: Han, Bo, et al.
Published: (2023)
Noise-Conditioned Mixture-of-Experts Framework for Robust Speaker Verification
by: Gu, Bin, et al.
Published: (2025)
by: Gu, Bin, et al.
Published: (2025)
DanceAnyWay: Synthesizing Beat-Guided 3D Dances with Randomized Temporal Contrastive Learning
by: Bhattacharya, Aneesh, et al.
Published: (2023)
by: Bhattacharya, Aneesh, et al.
Published: (2023)
Combining Genre Classification and Harmonic-Percussive Features with Diffusion Models for Music-Video Generation
by: Pina, Leonardo, et al.
Published: (2024)
by: Pina, Leonardo, et al.
Published: (2024)
Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization
by: He, Mao-Kui, et al.
Published: (2024)
by: He, Mao-Kui, et al.
Published: (2024)
Sonic4D: Spatial Audio Generation for Immersive 4D Scene Exploration
by: Xie, Siyi, et al.
Published: (2025)
by: Xie, Siyi, et al.
Published: (2025)
StereoFoley: Object-Aware Stereo Audio Generation from Video
by: Karchkhadze, Tornike, et al.
Published: (2025)
by: Karchkhadze, Tornike, et al.
Published: (2025)
DanceChat: Large Language Model-Guided Music-to-Dance Generation
by: Wang, Qing, et al.
Published: (2025)
by: Wang, Qing, et al.
Published: (2025)
Low-latency Speech Enhancement via Speech Token Generation
by: Xue, Huaying, et al.
Published: (2023)
by: Xue, Huaying, et al.
Published: (2023)
Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions
by: Yuan, Yi, et al.
Published: (2024)
by: Yuan, Yi, et al.
Published: (2024)
InterDance:Reactive 3D Dance Generation with Realistic Duet Interactions
by: Li, Ronghui, et al.
Published: (2024)
by: Li, Ronghui, et al.
Published: (2024)
Efficient Speech Watermarking for Speech Synthesis via Progressive Knowledge Distillation
by: Cui, Yang, et al.
Published: (2025)
by: Cui, Yang, et al.
Published: (2025)
Flexible Control in Symbolic Music Generation via Musical Metadata
by: Han, Sangjun, et al.
Published: (2024)
by: Han, Sangjun, et al.
Published: (2024)
Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech
by: Niu, Xinlei, et al.
Published: (2025)
by: Niu, Xinlei, et al.
Published: (2025)
FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation
by: Jiang, Yuxuan, et al.
Published: (2025)
by: Jiang, Yuxuan, et al.
Published: (2025)
Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Immersive Audiobook Generation
by: Rong, Yan, et al.
Published: (2025)
by: Rong, Yan, et al.
Published: (2025)
Music Genre Classification: Ensemble Learning with Subcomponents-level Attention
by: Liu, Yichen, et al.
Published: (2024)
by: Liu, Yichen, et al.
Published: (2024)
MeloTrans: A Text to Symbolic Music Generation Model Following Human Composition Habit
by: Wang, Yutian, et al.
Published: (2024)
by: Wang, Yutian, et al.
Published: (2024)
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
by: Cong, Gaoxiang, et al.
Published: (2024)
by: Cong, Gaoxiang, et al.
Published: (2024)
Intelligent Text-Conditioned Music Generation
by: Xie, Zhouyao, et al.
Published: (2024)
by: Xie, Zhouyao, et al.
Published: (2024)
RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues
by: Pan, Tianrui, et al.
Published: (2024)
by: Pan, Tianrui, et al.
Published: (2024)
A Survey on Evaluation Metrics for Music Generation
by: Kader, Faria Binte, et al.
Published: (2025)
by: Kader, Faria Binte, et al.
Published: (2025)
SonicVisionLM: Playing Sound with Vision Language Models
by: Xie, Zhifeng, et al.
Published: (2024)
by: Xie, Zhifeng, et al.
Published: (2024)
Controllable Dance Generation with Style-Guided Motion Diffusion
by: Wang, Hongsong, et al.
Published: (2024)
by: Wang, Hongsong, et al.
Published: (2024)
LoVA: Long-form Video-to-Audio Generation
by: Cheng, Xin, et al.
Published: (2024)
by: Cheng, Xin, et al.
Published: (2024)
SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation
by: Shimada, Kazuki, et al.
Published: (2024)
by: Shimada, Kazuki, et al.
Published: (2024)
Building Audio-Visual Digital Twins with Smartphones
by: Lan, Zitong, et al.
Published: (2025)
by: Lan, Zitong, et al.
Published: (2025)
MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence
by: You, Fuming, et al.
Published: (2024)
by: You, Fuming, et al.
Published: (2024)
Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis
by: Huang, Zhiqi, et al.
Published: (2024)
by: Huang, Zhiqi, et al.
Published: (2024)
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting
by: Zhang, Yu, et al.
Published: (2025)
by: Zhang, Yu, et al.
Published: (2025)
ScripTONES: Sentiment-Conditioned Music Generation for Movie Scripts
by: Veerendranath, Vishruth, et al.
Published: (2024)
by: Veerendranath, Vishruth, et al.
Published: (2024)
FastTalker: Jointly Generating Speech and Conversational Gestures from Text
by: Guo, Zixin, et al.
Published: (2024)
by: Guo, Zixin, et al.
Published: (2024)
CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation
by: Oh, Hyunwoo, et al.
Published: (2025)
by: Oh, Hyunwoo, et al.
Published: (2025)
MusFlow: Multimodal Music Generation via Conditional Flow Matching
by: Song, Jiahao, et al.
Published: (2025)
by: Song, Jiahao, et al.
Published: (2025)
Similar Items
-
MATHDance: Mamba-Transformer Architecture with Uniform Tokenization for High-Quality 3D Dance Generation
by: Yang, Kaixing, et al.
Published: (2025) -
CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition
by: Yang, Kaixing, et al.
Published: (2024) -
GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment
by: Wang, Jinting, et al.
Published: (2025) -
Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
by: Li, Xiaojie, et al.
Published: (2025) -
Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information
by: Huang, Qiaochu, et al.
Published: (2024)