Saved in:
| Main Authors: | Zheng, Longtao, Zhang, Yifan, Guo, Hanzhong, Pan, Jiachun, Tan, Zhenxiong, Lu, Jiahao, Tang, Chuanxin, An, Bo, Yan, Shuicheng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.04448 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Real-time One-Step Diffusion-based Expressive Portrait Videos Generation
by: Guo, Hanzhong, et al.
Published: (2024)
by: Guo, Hanzhong, et al.
Published: (2024)
EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
by: Wang, Haotian, et al.
Published: (2024)
by: Wang, Haotian, et al.
Published: (2024)
Video-Infinity: Distributed Long Video Generation
by: Tan, Zhenxiong, et al.
Published: (2024)
by: Tan, Zhenxiong, et al.
Published: (2024)
TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles
by: Ma, Yifeng, et al.
Published: (2023)
by: Ma, Yifeng, et al.
Published: (2023)
ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer
by: Yu, Ruonan, et al.
Published: (2026)
by: Yu, Ruonan, et al.
Published: (2026)
Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation
by: Dai, Gang, et al.
Published: (2025)
by: Dai, Gang, et al.
Published: (2025)
Poison-splat: Computation Cost Attack on 3D Gaussian Splatting
by: Lu, Jiahao, et al.
Published: (2024)
by: Lu, Jiahao, et al.
Published: (2024)
Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters
by: Hogue, Steven, et al.
Published: (2024)
by: Hogue, Steven, et al.
Published: (2024)
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
by: Liu, Songhua, et al.
Published: (2024)
by: Liu, Songhua, et al.
Published: (2024)
Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation
by: Chopin, Baptiste, et al.
Published: (2025)
by: Chopin, Baptiste, et al.
Published: (2025)
MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice
by: Yi, Hongwei, et al.
Published: (2025)
by: Yi, Hongwei, et al.
Published: (2025)
Memories are One-to-Many Mapping Alleviators in Talking Face Generation
by: Tang, Anni, et al.
Published: (2022)
by: Tang, Anni, et al.
Published: (2022)
The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing
by: Nie, Shen, et al.
Published: (2023)
by: Nie, Shen, et al.
Published: (2023)
Minute-Long Videos with Dual Parallelisms
by: Wang, Zeqing, et al.
Published: (2025)
by: Wang, Zeqing, et al.
Published: (2025)
Multimodal Diffusion Transformer with Memory Bank for Scalable Long-Duration Talking Video Generation
by: Zhang, Haojie, et al.
Published: (2024)
by: Zhang, Haojie, et al.
Published: (2024)
Decomposing Subject-Driven Image Generation via Intermediate Structural Prediction
by: Guo, Hanzhong, et al.
Published: (2026)
by: Guo, Hanzhong, et al.
Published: (2026)
FreeSwim: Revisiting Sliding-Window Attention Mechanisms for Training-Free Ultra-High-Resolution Video Generation
by: Wu, Yunfeng, et al.
Published: (2025)
by: Wu, Yunfeng, et al.
Published: (2025)
AgentStudio: A Toolkit for Building General Virtual Agents
by: Zheng, Longtao, et al.
Published: (2024)
by: Zheng, Longtao, et al.
Published: (2024)
Enhancing Long Video Generation Consistency without Tuning
by: Li, Xingyao, et al.
Published: (2024)
by: Li, Xingyao, et al.
Published: (2024)
Versatile Multimodal Controls for Expressive Talking Human Animation
by: Qin, Zheng, et al.
Published: (2025)
by: Qin, Zheng, et al.
Published: (2025)
AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models
by: Pan, Jiachun, et al.
Published: (2023)
by: Pan, Jiachun, et al.
Published: (2023)
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms
by: He, Muyang, et al.
Published: (2026)
by: He, Muyang, et al.
Published: (2026)
MDTv2: Masked Diffusion Transformer is a Strong Image Synthesizer
by: Gao, Shanghua, et al.
Published: (2023)
by: Gao, Shanghua, et al.
Published: (2023)
Image Editing As Programs with Diffusion Models
by: Hu, Yujia, et al.
Published: (2025)
by: Hu, Yujia, et al.
Published: (2025)
Video-based Generalized Category Discovery via Memory-Guided Consistency-Aware Contrastive Learning
by: Jing, Zhang, et al.
Published: (2025)
by: Jing, Zhang, et al.
Published: (2025)
AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation
by: Sun, Yasheng, et al.
Published: (2024)
by: Sun, Yasheng, et al.
Published: (2024)
UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation
by: Guo, Qin, et al.
Published: (2025)
by: Guo, Qin, et al.
Published: (2025)
SpotEdit: Selective Region Editing in Diffusion Transformers
by: Qin, Zhibin, et al.
Published: (2025)
by: Qin, Zhibin, et al.
Published: (2025)
OminiControl2: Efficient Conditioning for Diffusion Transformers
by: Tan, Zhenxiong, et al.
Published: (2025)
by: Tan, Zhenxiong, et al.
Published: (2025)
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
by: Chen, Zigeng, et al.
Published: (2024)
by: Chen, Zigeng, et al.
Published: (2024)
TokTalk: Expressive Real-time Facial Animation from Audio-LLM Tokens
by: Zhao, Qingcheng, et al.
Published: (2026)
by: Zhao, Qingcheng, et al.
Published: (2026)
FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases
by: Tan, Shuai, et al.
Published: (2025)
by: Tan, Shuai, et al.
Published: (2025)
Text-based Talking Video Editing with Cascaded Conditional Diffusion
by: Han, Bo, et al.
Published: (2024)
by: Han, Bo, et al.
Published: (2024)
EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
by: Tian, Linrui, et al.
Published: (2024)
by: Tian, Linrui, et al.
Published: (2024)
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
by: Jin, Peng, et al.
Published: (2024)
by: Jin, Peng, et al.
Published: (2024)
OminiControl: Minimal and Universal Control for Diffusion Transformer
by: Tan, Zhenxiong, et al.
Published: (2024)
by: Tan, Zhenxiong, et al.
Published: (2024)
Context-aware Talking Face Video Generation
by: Xuanyuan, Meidai, et al.
Published: (2024)
by: Xuanyuan, Meidai, et al.
Published: (2024)
ConsistTalk: Intensity Controllable Temporally Consistent Talking Head Generation with Diffusion Noise Search
by: Liu, Zhenjie, et al.
Published: (2025)
by: Liu, Zhenjie, et al.
Published: (2025)
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
by: Shi, Yujun, et al.
Published: (2023)
by: Shi, Yujun, et al.
Published: (2023)
Generative Latent Video Compression
by: Guo, Zongyu, et al.
Published: (2025)
by: Guo, Zongyu, et al.
Published: (2025)
Similar Items
-
Real-time One-Step Diffusion-based Expressive Portrait Videos Generation
by: Guo, Hanzhong, et al.
Published: (2024) -
EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
by: Wang, Haotian, et al.
Published: (2024) -
Video-Infinity: Distributed Long Video Generation
by: Tan, Zhenxiong, et al.
Published: (2024) -
TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles
by: Ma, Yifeng, et al.
Published: (2023) -
ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer
by: Yu, Ruonan, et al.
Published: (2026)