Saved in:
| Main Author: | Easthope, Eric |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.04800 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation
by: Airale, Louis, et al.
Published: (2023)
by: Airale, Louis, et al.
Published: (2023)
UniMuMo: Unified Text, Music and Motion Generation
by: Yang, Han, et al.
Published: (2024)
by: Yang, Han, et al.
Published: (2024)
MIDGET: Music Conditioned 3D Dance Generation
by: Wang, Jinwu, et al.
Published: (2024)
by: Wang, Jinwu, et al.
Published: (2024)
GCDance: Genre-Controlled Music-Driven 3D Full Body Dance Generation
by: Liu, Xinran, et al.
Published: (2025)
by: Liu, Xinran, et al.
Published: (2025)
Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis
by: Zhang, Zeyi, et al.
Published: (2024)
by: Zhang, Zeyi, et al.
Published: (2024)
Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives
by: Li, Ronghui, et al.
Published: (2024)
by: Li, Ronghui, et al.
Published: (2024)
Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment
by: Siyao, Li, et al.
Published: (2024)
by: Siyao, Li, et al.
Published: (2024)
READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation
by: Wang, Haotian, et al.
Published: (2025)
by: Wang, Haotian, et al.
Published: (2025)
DIDiffGes: Decoupled Semi-Implicit Diffusion Models for Real-time Gesture Generation from Speech
by: Cheng, Yongkang, et al.
Published: (2025)
by: Cheng, Yongkang, et al.
Published: (2025)
Text-driven Talking Face Synthesis by Reprogramming Audio-driven Models
by: Choi, Jeongsoo, et al.
Published: (2023)
by: Choi, Jeongsoo, et al.
Published: (2023)
Seeing Sound: Assembling Sounds from Visuals for Audio-to-Image Generation
by: Petermann, Darius, et al.
Published: (2025)
by: Petermann, Darius, et al.
Published: (2025)
MACS: Multi-source Audio-to-image Generation with Contextual Significance and Semantic Alignment
by: Zhou, Hao, et al.
Published: (2025)
by: Zhou, Hao, et al.
Published: (2025)
MotionRAG-Diff: A Retrieval-Augmented Diffusion Framework for Long-Term Music-to-Dance Generation
by: Huang, Mingyang, et al.
Published: (2025)
by: Huang, Mingyang, et al.
Published: (2025)
Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis
by: Shen, Shuai, et al.
Published: (2025)
by: Shen, Shuai, et al.
Published: (2025)
Inter-Diffusion Generation Model of Speakers and Listeners for Effective Communication
by: Huang, Jinhe, et al.
Published: (2025)
by: Huang, Jinhe, et al.
Published: (2025)
DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling
by: Ghosh, Anindita, et al.
Published: (2025)
by: Ghosh, Anindita, et al.
Published: (2025)
RAP: Real-time Audio-driven Portrait Animation with Video Diffusion Transformer
by: Du, Fangyu, et al.
Published: (2025)
by: Du, Fangyu, et al.
Published: (2025)
TCDiff++: An End-to-end Trajectory-Controllable Diffusion Model for Harmonious Music-Driven Group Choreography
by: Dai, Yuqin, et al.
Published: (2025)
by: Dai, Yuqin, et al.
Published: (2025)
InterDance:Reactive 3D Dance Generation with Realistic Duet Interactions
by: Li, Ronghui, et al.
Published: (2024)
by: Li, Ronghui, et al.
Published: (2024)
NeRF-3DTalker: Neural Radiance Field with 3D Prior Aided Audio Disentanglement for Talking Head Synthesis
by: Liu, Xiaoxing, et al.
Published: (2025)
by: Liu, Xiaoxing, et al.
Published: (2025)
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
by: Ji, Xiaozhong, et al.
Published: (2024)
by: Ji, Xiaozhong, et al.
Published: (2024)
It Takes Two: Real-time Co-Speech Two-person's Interaction Generation via Reactive Auto-regressive Diffusion Model
by: Shi, Mingyi, et al.
Published: (2024)
by: Shi, Mingyi, et al.
Published: (2024)
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
by: Chen, Junming, et al.
Published: (2024)
by: Chen, Junming, et al.
Published: (2024)
Conditional GAN for Enhancing Diffusion Models in Efficient and Authentic Global Gesture Generation from Audios
by: Cheng, Yongkang, et al.
Published: (2024)
by: Cheng, Yongkang, et al.
Published: (2024)
GaussianSpeech: Audio-Driven Gaussian Avatars
by: Aneja, Shivangi, et al.
Published: (2024)
by: Aneja, Shivangi, et al.
Published: (2024)
FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation
by: Zhong, Tianyun, et al.
Published: (2024)
by: Zhong, Tianyun, et al.
Published: (2024)
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
by: Aneja, Shivangi, et al.
Published: (2023)
by: Aneja, Shivangi, et al.
Published: (2023)
Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation
by: Gan, Yuan, et al.
Published: (2025)
by: Gan, Yuan, et al.
Published: (2025)
Two-component spatiotemporal template for activation-inhibition of speech in ECoG
by: Easthope, Eric
Published: (2024)
by: Easthope, Eric
Published: (2024)
Integrating Representational Gestures into Automatically Generated Embodied Explanations and its Effects on Understanding and Interaction Quality
by: Robrecht, Amelie Sophie, et al.
Published: (2024)
by: Robrecht, Amelie Sophie, et al.
Published: (2024)
QEAN: Quaternion-Enhanced Attention Network for Visual Dance Generation
by: Zhou, Zhizhen, et al.
Published: (2024)
by: Zhou, Zhizhen, et al.
Published: (2024)
SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation
by: Pham, Kien T., et al.
Published: (2025)
by: Pham, Kien T., et al.
Published: (2025)
AsynFusion: Towards Asynchronous Latent Consistency Models for Decoupled Whole-Body Audio-Driven Avatars
by: Zhang, Tianbao, et al.
Published: (2025)
by: Zhang, Tianbao, et al.
Published: (2025)
Listen and Move: Improving GANs Coherency in Agnostic Sound-to-Video Generation
by: Redondo, Rafael
Published: (2024)
by: Redondo, Rafael
Published: (2024)
Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART
by: Tathe, Aniket, et al.
Published: (2024)
by: Tathe, Aniket, et al.
Published: (2024)
Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference
by: Zhang, Fan, et al.
Published: (2024)
by: Zhang, Fan, et al.
Published: (2024)
Hear What Matters! Text-conditioned Selective Video-to-Audio Generation
by: Lee, Junwon, et al.
Published: (2025)
by: Lee, Junwon, et al.
Published: (2025)
Joint Multimodal Transformer for Emotion Recognition in the Wild
by: Waligora, Paul, et al.
Published: (2024)
by: Waligora, Paul, et al.
Published: (2024)
Mitigating Multimodal LLMs Hallucinations via Relevance Propagation at Inference Time
by: Allouche, Itai, et al.
Published: (2026)
by: Allouche, Itai, et al.
Published: (2026)
Adversarial synthesis based data-augmentation for code-switched spoken language identification
by: Shastri, Parth, et al.
Published: (2022)
by: Shastri, Parth, et al.
Published: (2022)
Similar Items
-
A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation
by: Airale, Louis, et al.
Published: (2023) -
UniMuMo: Unified Text, Music and Motion Generation
by: Yang, Han, et al.
Published: (2024) -
MIDGET: Music Conditioned 3D Dance Generation
by: Wang, Jinwu, et al.
Published: (2024) -
GCDance: Genre-Controlled Music-Driven 3D Full Body Dance Generation
by: Liu, Xinran, et al.
Published: (2025) -
Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis
by: Zhang, Zeyi, et al.
Published: (2024)