Saved in:
| Main Authors: | Zhang, Tianbao, Zhao, Jian, Li, Yuer, Zhu, Zheng, Hu, Ping, Fan, Zhaoxin, Wu, Wenjun, Li, Xuelong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.15058 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Text-Driven Voice Conversion via Latent State-Space Modeling
by: Li, Wen, et al.
Published: (2025)
by: Li, Wen, et al.
Published: (2025)
Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation
by: Zhou, Xukun, et al.
Published: (2024)
by: Zhou, Xukun, et al.
Published: (2024)
Content and Style Aware Audio-Driven Facial Animation
by: Liu, Qingju, et al.
Published: (2024)
by: Liu, Qingju, et al.
Published: (2024)
DGFM: Full Body Dance Generation Driven by Music Foundation Models
by: Liu, Xinran, et al.
Published: (2025)
by: Liu, Xinran, et al.
Published: (2025)
GaussianSpeech: Audio-Driven Gaussian Avatars
by: Aneja, Shivangi, et al.
Published: (2024)
by: Aneja, Shivangi, et al.
Published: (2024)
ELGAR: Expressive Cello Performance Motion Generation for Audio Rendition
by: Qiu, Zhiping, et al.
Published: (2025)
by: Qiu, Zhiping, et al.
Published: (2025)
MATHDance: Mamba-Transformer Architecture with Uniform Tokenization for High-Quality 3D Dance Generation
by: Yang, Kaixing, et al.
Published: (2025)
by: Yang, Kaixing, et al.
Published: (2025)
EnchantDance: Unveiling the Potential of Music-Driven Dance Movement
by: Han, Bo, et al.
Published: (2023)
by: Han, Bo, et al.
Published: (2023)
READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation
by: Wang, Haotian, et al.
Published: (2025)
by: Wang, Haotian, et al.
Published: (2025)
Audio2Face-3D: Audio-driven Realistic Facial Animation For Digital Avatars
by: NVIDIA, et al.
Published: (2025)
by: NVIDIA, et al.
Published: (2025)
NAT: Neural Acoustic Transfer for Interactive Scenes in Real Time
by: Jin, Xutong, et al.
Published: (2025)
by: Jin, Xutong, et al.
Published: (2025)
LAV: Audio-Driven Dynamic Visual Generation with Neural Compression and StyleGAN2
by: Jung, Jongmin, et al.
Published: (2025)
by: Jung, Jongmin, et al.
Published: (2025)
PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis
by: Xie, Yifan, et al.
Published: (2024)
by: Xie, Yifan, et al.
Published: (2024)
Combining Genre Classification and Harmonic-Percussive Features with Diffusion Models for Music-Video Generation
by: Pina, Leonardo, et al.
Published: (2024)
by: Pina, Leonardo, et al.
Published: (2024)
TCDiff++: An End-to-end Trajectory-Controllable Diffusion Model for Harmonious Music-Driven Group Choreography
by: Dai, Yuqin, et al.
Published: (2025)
by: Dai, Yuqin, et al.
Published: (2025)
Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis
by: Shen, Shuai, et al.
Published: (2025)
by: Shen, Shuai, et al.
Published: (2025)
Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
by: Zhang, Fan, et al.
Published: (2023)
by: Zhang, Fan, et al.
Published: (2023)
Listen and Move: Improving GANs Coherency in Agnostic Sound-to-Video Generation
by: Redondo, Rafael
Published: (2024)
by: Redondo, Rafael
Published: (2024)
SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering
by: Nishizawa, Hiroki, et al.
Published: (2024)
by: Nishizawa, Hiroki, et al.
Published: (2024)
Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation
by: Huang, Zikai, et al.
Published: (2024)
by: Huang, Zikai, et al.
Published: (2024)
GCDance: Genre-Controlled Music-Driven 3D Full Body Dance Generation
by: Liu, Xinran, et al.
Published: (2025)
by: Liu, Xinran, et al.
Published: (2025)
RAP: Real-time Audio-driven Portrait Animation with Video Diffusion Transformer
by: Du, Fangyu, et al.
Published: (2025)
by: Du, Fangyu, et al.
Published: (2025)
DanceAnyWay: Synthesizing Beat-Guided 3D Dances with Randomized Temporal Contrastive Learning
by: Bhattacharya, Aneesh, et al.
Published: (2023)
by: Bhattacharya, Aneesh, et al.
Published: (2023)
MusicScore: A Dataset for Music Score Modeling and Generation
by: Lin, Yuheng, et al.
Published: (2024)
by: Lin, Yuheng, et al.
Published: (2024)
DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling
by: Ghosh, Anindita, et al.
Published: (2025)
by: Ghosh, Anindita, et al.
Published: (2025)
DIDiffGes: Decoupled Semi-Implicit Diffusion Models for Real-time Gesture Generation from Speech
by: Cheng, Yongkang, et al.
Published: (2025)
by: Cheng, Yongkang, et al.
Published: (2025)
Text-driven Talking Face Synthesis by Reprogramming Audio-driven Models
by: Choi, Jeongsoo, et al.
Published: (2023)
by: Choi, Jeongsoo, et al.
Published: (2023)
Seeing Sound: Assembling Sounds from Visuals for Audio-to-Image Generation
by: Petermann, Darius, et al.
Published: (2025)
by: Petermann, Darius, et al.
Published: (2025)
MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment
by: Zhou, Hao, et al.
Published: (2025)
by: Zhou, Hao, et al.
Published: (2025)
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
by: Aneja, Shivangi, et al.
Published: (2023)
by: Aneja, Shivangi, et al.
Published: (2023)
FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance
by: Wang, Ruocheng, et al.
Published: (2024)
by: Wang, Ruocheng, et al.
Published: (2024)
Gaunt coefficients for complex and real spherical harmonics with applications to spherical array processing and Ambisonics
by: Politis, Archontis
Published: (2024)
by: Politis, Archontis
Published: (2024)
FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation
by: Zhong, Tianyun, et al.
Published: (2024)
by: Zhong, Tianyun, et al.
Published: (2024)
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
by: Ji, Xiaozhong, et al.
Published: (2024)
by: Ji, Xiaozhong, et al.
Published: (2024)
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
by: Liu, Huadai, et al.
Published: (2024)
by: Liu, Huadai, et al.
Published: (2024)
M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis
by: Yin, Zhizhuo, et al.
Published: (2025)
by: Yin, Zhizhuo, et al.
Published: (2025)
DanceMeld: Unraveling Dance Phrases with Hierarchical Latent Codes for Music-to-Dance Synthesis
by: Gao, Xin, et al.
Published: (2023)
by: Gao, Xin, et al.
Published: (2023)
Hindi audio-video-Deepfake (HAV-DF): A Hindi language-based Audio-video Deepfake Dataset
by: Kaur, Sukhandeep, et al.
Published: (2024)
by: Kaur, Sukhandeep, et al.
Published: (2024)
Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives
by: Li, Ronghui, et al.
Published: (2024)
by: Li, Ronghui, et al.
Published: (2024)
NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis
by: Liu, Xiaoxing, et al.
Published: (2025)
by: Liu, Xiaoxing, et al.
Published: (2025)
Similar Items
-
Text-Driven Voice Conversion via Latent State-Space Modeling
by: Li, Wen, et al.
Published: (2025) -
Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation
by: Zhou, Xukun, et al.
Published: (2024) -
Content and Style Aware Audio-Driven Facial Animation
by: Liu, Qingju, et al.
Published: (2024) -
DGFM: Full Body Dance Generation Driven by Music Foundation Models
by: Liu, Xinran, et al.
Published: (2025) -
GaussianSpeech: Audio-Driven Gaussian Avatars
by: Aneja, Shivangi, et al.
Published: (2024)