:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zheng, Longtao, Zhang, Yifan, Guo, Hanzhong, Pan, Jiachun, Tan, Zhenxiong, Lu, Jiahao, Tang, Chuanxin, An, Bo, Yan, Shuicheng
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2412.04448
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Real-time One-Step Diffusion-based Expressive Portrait Videos Generation
by: Guo, Hanzhong, et al.
Published: (2024)

EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
by: Wang, Haotian, et al.
Published: (2024)

Video-Infinity: Distributed Long Video Generation
by: Tan, Zhenxiong, et al.
Published: (2024)

TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles
by: Ma, Yifeng, et al.
Published: (2023)

ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer
by: Yu, Ruonan, et al.
Published: (2026)

Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation
by: Dai, Gang, et al.
Published: (2025)

Poison-splat: Computation Cost Attack on 3D Gaussian Splatting
by: Lu, Jiahao, et al.
Published: (2024)

Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters
by: Hogue, Steven, et al.
Published: (2024)

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
by: Liu, Songhua, et al.
Published: (2024)

Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation
by: Chopin, Baptiste, et al.
Published: (2025)

MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice
by: Yi, Hongwei, et al.
Published: (2025)

Memories are One-to-Many Mapping Alleviators in Talking Face Generation
by: Tang, Anni, et al.
Published: (2022)

The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing
by: Nie, Shen, et al.
Published: (2023)

Minute-Long Videos with Dual Parallelisms
by: Wang, Zeqing, et al.
Published: (2025)

Multimodal Diffusion Transformer with Memory Bank for Scalable Long-Duration Talking Video Generation
by: Zhang, Haojie, et al.
Published: (2024)

Decomposing Subject-Driven Image Generation via Intermediate Structural Prediction
by: Guo, Hanzhong, et al.
Published: (2026)

FreeSwim: Revisiting Sliding-Window Attention Mechanisms for Training-Free Ultra-High-Resolution Video Generation
by: Wu, Yunfeng, et al.
Published: (2025)

AgentStudio: A Toolkit for Building General Virtual Agents
by: Zheng, Longtao, et al.
Published: (2024)

Enhancing Long Video Generation Consistency without Tuning
by: Li, Xingyao, et al.
Published: (2024)

Versatile Multimodal Controls for Expressive Talking Human Animation
by: Qin, Zheng, et al.
Published: (2025)

AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models
by: Pan, Jiachun, et al.
Published: (2023)

Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms
by: He, Muyang, et al.
Published: (2026)

MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
by: Gao, Shanghua, et al.
Published: (2023)

Image Editing As Programs with Diffusion Models
by: Hu, Yujia, et al.
Published: (2025)

Video-based Generalized Category Discovery via Memory-Guided Consistency-Aware Contrastive Learning
by: Jing, Zhang, et al.
Published: (2025)

AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation
by: Sun, Yasheng, et al.
Published: (2024)

UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation
by: Guo, Qin, et al.
Published: (2025)

SpotEdit: Selective Region Editing in Diffusion Transformers
by: Qin, Zhibin, et al.
Published: (2025)

OminiControl2: Efficient Conditioning for Diffusion Transformers
by: Tan, Zhenxiong, et al.
Published: (2025)

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
by: Chen, Zigeng, et al.
Published: (2024)

TokTalk: Expressive Real-time Facial Animation from Audio-LLM Tokens
by: Zhao, Qingcheng, et al.
Published: (2026)

FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases
by: Tan, Shuai, et al.
Published: (2025)

Text-based Talking Video Editing with Cascaded Conditional Diffusion
by: Han, Bo, et al.
Published: (2024)

EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
by: Tian, Linrui, et al.
Published: (2024)

Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
by: Jin, Peng, et al.
Published: (2024)

OminiControl: Minimal and Universal Control for Diffusion Transformer
by: Tan, Zhenxiong, et al.
Published: (2024)

Context-aware Talking Face Video Generation
by: Xuanyuan, Meidai, et al.
Published: (2024)

ConsistTalk: Intensity Controllable Temporally Consistent Talking Head Generation with Diffusion Noise Search
by: Liu, Zhenjie, et al.
Published: (2025)

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
by: Shi, Yujun, et al.
Published: (2023)

Generative Latent Video Compression
by: Guo, Zongyu, et al.
Published: (2025)