Saved in:
| Main Authors: | Yang, Han, Su, Kun, Zhang, Yutong, Chen, Jiaben, Qian, Kaizhi, Liu, Gaowen, Gan, Chuang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.04534 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MusicScore: A Dataset for Music Score Modeling and Generation
by: Lin, Yuheng, et al.
Published: (2024)
by: Lin, Yuheng, et al.
Published: (2024)
Combining Genre Classification and Harmonic-Percussive Features with Diffusion Models for Music-Video Generation
by: Pina, Leonardo, et al.
Published: (2024)
by: Pina, Leonardo, et al.
Published: (2024)
MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence
by: You, Fuming, et al.
Published: (2024)
by: You, Fuming, et al.
Published: (2024)
MATHDance: Mamba-Transformer Architecture with Uniform Tokenization for High-Quality 3D Dance Generation
by: Yang, Kaixing, et al.
Published: (2025)
by: Yang, Kaixing, et al.
Published: (2025)
DanceAnyWay: Synthesizing Beat-Guided 3D Dances with Randomized Temporal Contrastive Learning
by: Bhattacharya, Aneesh, et al.
Published: (2023)
by: Bhattacharya, Aneesh, et al.
Published: (2023)
MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models
by: Liu, Shansong, et al.
Published: (2024)
by: Liu, Shansong, et al.
Published: (2024)
Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation
by: Han, Zhen, et al.
Published: (2025)
by: Han, Zhen, et al.
Published: (2025)
LAV: Audio-Driven Dynamic Visual Generation with Neural Compression and StyleGAN2
by: Jung, Jongmin, et al.
Published: (2025)
by: Jung, Jongmin, et al.
Published: (2025)
Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
by: Zhang, Fan, et al.
Published: (2023)
by: Zhang, Fan, et al.
Published: (2023)
PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis
by: Xie, Yifan, et al.
Published: (2024)
by: Xie, Yifan, et al.
Published: (2024)
Intelligent Text-Conditioned Music Generation
by: Xie, Zhouyao, et al.
Published: (2024)
by: Xie, Zhouyao, et al.
Published: (2024)
InterDance:Reactive 3D Dance Generation with Realistic Duet Interactions
by: Li, Ronghui, et al.
Published: (2024)
by: Li, Ronghui, et al.
Published: (2024)
MeloTrans: A Text to Symbolic Music Generation Model Following Human Composition Habit
by: Wang, Yutian, et al.
Published: (2024)
by: Wang, Yutian, et al.
Published: (2024)
Flowers Revisited: A Preliminary Replication of Flowers et al. 1997
by: Enge, Kajetan, et al.
Published: (2024)
by: Enge, Kajetan, et al.
Published: (2024)
Flexible Control in Symbolic Music Generation via Musical Metadata
by: Han, Sangjun, et al.
Published: (2024)
by: Han, Sangjun, et al.
Published: (2024)
SteerMusic: Enhanced Musical Consistency for Zero-shot Text-guided and Personalized Music Editing
by: Niu, Xinlei, et al.
Published: (2025)
by: Niu, Xinlei, et al.
Published: (2025)
Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
by: Li, Xiaojie, et al.
Published: (2025)
by: Li, Xiaojie, et al.
Published: (2025)
It Takes Two: Real-time Co-Speech Two-person's Interaction Generation via Reactive Auto-regressive Diffusion Model
by: Shi, Mingyi, et al.
Published: (2024)
by: Shi, Mingyi, et al.
Published: (2024)
Video-Guided Text-to-Music Generation Using Public Domain Movie Collections
by: Kim, Haven, et al.
Published: (2025)
by: Kim, Haven, et al.
Published: (2025)
SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering
by: Nishizawa, Hiroki, et al.
Published: (2024)
by: Nishizawa, Hiroki, et al.
Published: (2024)
Hindi audio-video-Deepfake (HAV-DF): A Hindi language-based Audio-video Deepfake Dataset
by: Kaur, Sukhandeep, et al.
Published: (2024)
by: Kaur, Sukhandeep, et al.
Published: (2024)
NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis
by: Liu, Xiaoxing, et al.
Published: (2025)
by: Liu, Xiaoxing, et al.
Published: (2025)
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
by: Ji, Xiaozhong, et al.
Published: (2024)
by: Ji, Xiaozhong, et al.
Published: (2024)
Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model
by: Karchkhadze, Tornike, et al.
Published: (2024)
by: Karchkhadze, Tornike, et al.
Published: (2024)
MusicAOG: an Energy-Based Model for Learning and Sampling a Hierarchical Representation of Symbolic Music
by: Qian, Yikai, et al.
Published: (2024)
by: Qian, Yikai, et al.
Published: (2024)
DiM-Gestor: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2
by: Zhang, Fan, et al.
Published: (2024)
by: Zhang, Fan, et al.
Published: (2024)
A Survey on Evaluation Metrics for Music Generation
by: Kader, Faria Binte, et al.
Published: (2025)
by: Kader, Faria Binte, et al.
Published: (2025)
Dance-to-Music Generation with Encoder-based Textual Inversion
by: Li, Sifei, et al.
Published: (2024)
by: Li, Sifei, et al.
Published: (2024)
ScripTONES: Sentiment-Conditioned Music Generation for Movie Scripts
by: Veerendranath, Vishruth, et al.
Published: (2024)
by: Veerendranath, Vishruth, et al.
Published: (2024)
EnchantDance: Unveiling the Potential of Music-Driven Dance Movement
by: Han, Bo, et al.
Published: (2023)
by: Han, Bo, et al.
Published: (2023)
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
by: Li, Ruiqi, et al.
Published: (2024)
by: Li, Ruiqi, et al.
Published: (2024)
MusFlow: Multimodal Music Generation via Conditional Flow Matching
by: Song, Jiahao, et al.
Published: (2025)
by: Song, Jiahao, et al.
Published: (2025)
SONIQUE: Video Background Music Generation Using Unpaired Audio-Visual Data
by: Zhang, Liqian, et al.
Published: (2024)
by: Zhang, Liqian, et al.
Published: (2024)
Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information
by: Huang, Qiaochu, et al.
Published: (2024)
by: Huang, Qiaochu, et al.
Published: (2024)
MusicSem: A Semantically Rich Language--Audio Dataset of Natural Music Descriptions
by: Salganik, Rebecca, et al.
Published: (2026)
by: Salganik, Rebecca, et al.
Published: (2026)
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models
by: Weck, Benno, et al.
Published: (2024)
by: Weck, Benno, et al.
Published: (2024)
Diff4Steer: Steerable Diffusion Prior for Generative Music Retrieval with Semantic Guidance
by: Bao, Xuchan, et al.
Published: (2024)
by: Bao, Xuchan, et al.
Published: (2024)
CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition
by: Yang, Kaixing, et al.
Published: (2024)
by: Yang, Kaixing, et al.
Published: (2024)
M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models
by: Liu, Shansong, et al.
Published: (2023)
by: Liu, Shansong, et al.
Published: (2023)
VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music
by: Shi, Jiatong, et al.
Published: (2024)
by: Shi, Jiatong, et al.
Published: (2024)
Similar Items
-
MusicScore: A Dataset for Music Score Modeling and Generation
by: Lin, Yuheng, et al.
Published: (2024) -
Combining Genre Classification and Harmonic-Percussive Features with Diffusion Models for Music-Video Generation
by: Pina, Leonardo, et al.
Published: (2024) -
MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence
by: You, Fuming, et al.
Published: (2024) -
MATHDance: Mamba-Transformer Architecture with Uniform Tokenization for High-Quality 3D Dance Generation
by: Yang, Kaixing, et al.
Published: (2025) -
DanceAnyWay: Synthesizing Beat-Guided 3D Dances with Randomized Temporal Contrastive Learning
by: Bhattacharya, Aneesh, et al.
Published: (2023)