:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hogue, Steven, Zhang, Chenxu, Tian, Yapeng, Guo, Xiaohu
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2412.14333
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
by: Hogue, Steven, et al.
Published: (2024)

Robust Active Speaker Detection in Noisy Environments
by: Vasireddy, Siva Sai Nagender, et al.
Published: (2024)

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
by: Liu, Haiyang, et al.
Published: (2023)

MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation
by: Zheng, Longtao, et al.
Published: (2024)

TalkingPose: Efficient Face and Gesture Animation with Feedback-guided Diffusion Model
by: Javanmardi, Alireza, et al.
Published: (2025)

IP-Adapter Is All You Need: Towards Fine-Tuning-Free Diffusion-Based Talking Face Generation
by: Wu, Hao, et al.
Published: (2026)

AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation
by: Sun, Yasheng, et al.
Published: (2024)

HoloGest: Decoupled Diffusion and Motion Priors for Generating Holisticly Expressive Co-speech Gestures
by: Cheng, Yongkang, et al.
Published: (2025)

LiveGesture Streamable Co-Speech Gesture Generation Model
by: Saleem, Muhammad Usama, et al.
Published: (2026)

Co$^{3}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
by: Qi, Xingqun, et al.
Published: (2025)

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
by: Du, Chenpeng, et al.
Published: (2023)

AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
by: Wang, Kai, et al.
Published: (2024)

Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation
by: Chopin, Baptiste, et al.
Published: (2025)

MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation
by: Mao, Xiaofeng, et al.
Published: (2024)

Contextual Gesture: Co-Speech Gesture Video Generation through Context-aware Gesture Representation
by: Liu, Pinxin, et al.
Published: (2025)

PersonaGesture: Single-Reference Co-Speech Gesture Personalization for Unseen Speakers
by: Zhang, Xiangyue, et al.
Published: (2026)

EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
by: Wang, Haotian, et al.
Published: (2024)

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
by: He, Xu, et al.
Published: (2024)

Recognizing Co-Speech Gestures in-the-Wild
by: Hegde, Sindhu B, et al.
Published: (2026)

Context-aware Talking Face Video Generation
by: Xuanyuan, Meidai, et al.
Published: (2024)

CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
by: Qi, Xingqun, et al.
Published: (2024)

DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model
by: Liu, Kangwei, et al.
Published: (2025)

CoordSpeaker: Exploiting Gesture Captioning for Coordinated Caption-Empowered Co-Speech Gesture Generation
by: Fang, Fengyi, et al.
Published: (2025)

Conveying Meaning through Gestures: An Investigation into Semantic Co-Speech Gesture Generation
by: Voss, Hendric, et al.
Published: (2025)

Streaming Generation of Co-Speech Gestures via Accelerated Rolling Diffusion
by: Vu, Evgeniia, et al.
Published: (2025)

ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis
by: Mughal, Muhammad Hamza, et al.
Published: (2024)

TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles
by: Ma, Yifeng, et al.
Published: (2023)

Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model
by: Shen, Fei, et al.
Published: (2025)

GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling
by: Liu, Pinxin, et al.
Published: (2025)

DuoGesture: Neuro-Inspired and Biomechanically Informed Dual-Stream Co-Speech Gesture Generation
by: Paar, Ferdinand, et al.
Published: (2026)

JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation
by: Chakkera, Sai Tanmay Reddy, et al.
Published: (2024)

Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation
by: Yaman, Dogucan, et al.
Published: (2024)

Audio-Driven Talking Face Video Generation with Joint Uncertainty Learning
by: Xie, Yifan, et al.
Published: (2025)

Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
by: Jang, Youngjoon, et al.
Published: (2024)

SemGes: Semantics-aware Co-Speech Gesture Generation using Semantic Coherence and Relevance Learning
by: Liu, Lanmiao, et al.
Published: (2025)

Taming Transformer for Emotion-Controllable Talking Face Generation
by: Zhang, Ziqi, et al.
Published: (2025)

A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
by: Min, Anna, et al.
Published: (2025)

Face Reconstruction from Face Embeddings using Adapter to a Face Foundation Model
by: Shahreza, Hatef Otroshi, et al.
Published: (2024)

Democratizing High-Fidelity Co-Speech Gesture Video Generation
by: Yang, Xu, et al.
Published: (2025)

EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face Animation
by: Lin, Yihong, et al.
Published: (2024)