:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Song, Yafei, Zhang, Peng, Zhang, Bang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.02576
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
by: Qi, Xingqun, et al.
Published: (2024)

Co$^{3}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
by: Qi, Xingqun, et al.
Published: (2025)

Motion-example-controlled Co-speech Gesture Generation Leveraging Large Language Models
by: Chen, Bohong, et al.
Published: (2025)

HoloGest: Decoupled Diffusion and Motion Priors for Generating Holisticly Expressive Co-speech Gestures
by: Cheng, Yongkang, et al.
Published: (2025)

Understanding Co-speech Gestures in-the-wild
by: Hegde, Sindhu B, et al.
Published: (2025)

Self-Supervised Learning of Deviation in Latent Representation for Co-speech Gesture Video Generation
by: Yang, Huan, et al.
Published: (2024)

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
by: He, Xu, et al.
Published: (2024)

DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
by: Hogue, Steven, et al.
Published: (2024)

Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
by: Qi, Xingqun, et al.
Published: (2023)

Contextual Gesture: Co-Speech Gesture Video Generation through Context-aware Gesture Representation
by: Liu, Pinxin, et al.
Published: (2025)

MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation
by: Wang, Siyuan, et al.
Published: (2025)

SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis
by: Zhang, Xiangyue, et al.
Published: (2024)

ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model
by: Qi, Jinwei, et al.
Published: (2025)

MotionRAG-Diff: A Retrieval-Augmented Diffusion Framework for Long-Term Music-to-Dance Generation
by: Huang, Mingyang, et al.
Published: (2025)

TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation
by: Liu, Haiyang, et al.
Published: (2024)

EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
by: Liu, Haiyang, et al.
Published: (2023)

Exploring Timeline Control for Facial Motion Generation
by: Ma, Yifeng, et al.
Published: (2025)

LiveGesture Streamable Co-Speech Gesture Generation Model
by: Saleem, Muhammad Usama, et al.
Published: (2026)

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
by: Lv, Jiaxi, et al.
Published: (2023)

GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling
by: Liu, Pinxin, et al.
Published: (2025)

MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
by: Zhu, Chenhui, et al.
Published: (2025)

PersonaGesture: Single-Reference Co-Speech Gesture Personalization for Unseen Speakers
by: Zhang, Xiangyue, et al.
Published: (2026)

Democratizing High-Fidelity Co-Speech Gesture Video Generation
by: Yang, Xu, et al.
Published: (2025)

Prompt-to-Gesture: Measuring the Capabilities of Image-to-Video Deictic Gesture Generation
by: Ali, Hassan, et al.
Published: (2026)

Dual-task Mutual Reinforcing Embedded Joint Video Paragraph Retrieval and Grounding
by: Wang, Mengzhao, et al.
Published: (2024)

T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
by: Li, Yili, et al.
Published: (2024)

ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis
by: Zhou, Xukun, et al.
Published: (2025)

EasyGenNet: An Efficient Framework for Audio-Driven Gesture Video Generation Based on Diffusion Model
by: Li, Renda, et al.
Published: (2025)

GeCo: Evaluating Geometric Consistency for Video Generation via Motion and Structure
by: Gu, Leslie, et al.
Published: (2025)

CoordSpeaker: Exploiting Gesture Captioning for Coordinated Caption-Empowered Co-Speech Gesture Generation
by: Fang, Fengyi, et al.
Published: (2025)

Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters
by: Hogue, Steven, et al.
Published: (2024)

RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
by: Peruzzo, Elia, et al.
Published: (2025)

PersonaGest: Personalized Co-Speech Gesture Generation with Semantic-Guided Hierarchical Motion Representation
by: Zhao, Junchuan, et al.
Published: (2026)

Controllable and Expressive One-Shot Video Head Swapping
by: Ji, Chaonan, et al.
Published: (2025)

Video Motion Graphs
by: Liu, Haiyang, et al.
Published: (2025)

Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers
by: Sun, Yasheng, et al.
Published: (2025)

Conveying Meaning through Gestures: An Investigation into Semantic Co-Speech Gesture Generation
by: Voss, Hendric, et al.
Published: (2025)

Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding
by: Shen, Xiaoqian, et al.
Published: (2025)

InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation
by: Rajan, Sreehari, et al.
Published: (2025)

Wan-S2V: Audio-Driven Cinematic Video Generation
by: Gao, Xin, et al.
Published: (2025)