:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Yutong, Zhang, Haiyu, Xue, Tianfan, Qiao, Yu, Wang, Yaohui, Xu, Chang, Chen, Xinyuan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.06802
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PARE: Pruning and Adaptive Routing for Efficient Video Generation
by: Wang, Yutong, et al.
Published: (2026)

AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset
by: Zhang, Haiyu, et al.
Published: (2025)

4Diffusion: Multi-view Video Diffusion Model for 4D Generation
by: Zhang, Haiyu, et al.
Published: (2024)

GenHOI: Generalizing Text-driven 4D Human-Object Interaction Synthesis for Unseen Objects
by: Li, Shujia, et al.
Published: (2025)

ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions
by: Wu, Xiaoxue, et al.
Published: (2025)

CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models
by: Wu, Xiaoxue, et al.
Published: (2025)

ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation
by: Peng, Bo, et al.
Published: (2023)

LEO: Generative Latent Image Animator for Human Video Synthesis
by: Wang, Yaohui, et al.
Published: (2023)

BEAT: Rhythm-Elastic Alignment for Agentic Music-guided Movie Trailer Generation
by: Wang, Yutong, et al.
Published: (2026)

Latte: Latent Diffusion Transformer for Video Generation
by: Ma, Xin, et al.
Published: (2024)

LIA-X: Interpretable Latent Portrait Animator
by: Wang, Yaohui, et al.
Published: (2025)

WSVD: Weighted Low-Rank Approximation for Fast and Efficient Execution of Low-Precision Vision-Language Models
by: Wang, Haiyu, et al.
Published: (2026)

The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation
by: Gao, Bingjie, et al.
Published: (2025)

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
by: Wen, Hao, et al.
Published: (2024)

HDRFlow: Real-Time HDR Video Reconstruction with Large Motions
by: Xu, Gangwei, et al.
Published: (2024)

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
by: Ma, Xin, et al.
Published: (2024)

RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling
by: Gao, Bingjie, et al.
Published: (2025)

PosterOmni: Generalized Artistic Poster Creation via Task Distillation and Unified Reward Feedback
by: Chen, Sixiang, et al.
Published: (2026)

Training-free Stylized Text-to-Image Generation with Fast Inference
by: Ma, Xin, et al.
Published: (2025)

LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning
by: Gao, Chenjian, et al.
Published: (2025)

Vlogger: Make Your Dream A Vlog
by: Zhuang, Shaobin, et al.
Published: (2024)

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
by: Huang, Ziqi, et al.
Published: (2024)

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
by: Wang, Yi, et al.
Published: (2023)

Memory-Efficient Transfer Learning with Fading Side Networks via Masked Dual Path Distillation
by: Zhang, Yutong, et al.
Published: (2026)

EGVD: Event-Guided Video Diffusion Model for Physically Realistic Large-Motion Frame Interpolation
by: Zhang, Ziran, et al.
Published: (2025)

TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision
by: Zhuang, Shaobin, et al.
Published: (2025)

AdaptiveISP: Learning an Adaptive Image Signal Processor for Object Detection
by: Wang, Yujin, et al.
Published: (2024)

VideoDistill: Language-aware Vision Distillation for Video Question Answering
by: Zou, Bo, et al.
Published: (2024)

Uni-ISP: Toward Unifying the Learning of ISPs from Multiple Mobile Cameras
by: Li, Lingen, et al.
Published: (2024)

DA-VAE: Plug-in Latent Compression for Diffusion via Detail Alignment
by: Cai, Xin, et al.
Published: (2026)

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
by: Huang, Zehuan, et al.
Published: (2023)

From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos
by: Gao, Chenjian, et al.
Published: (2025)

Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining
by: Dong, Lu, et al.
Published: (2025)

DiffIER: Optimizing Diffusion Models with Iterative Error Reduction
by: Chen, Ao, et al.
Published: (2025)

Follow-Your-Creation: Empowering 4D Creation through Video Inpainting
by: Ma, Yue, et al.
Published: (2025)

Consistent and Controllable Image Animation with Motion Linear Diffusion Transformers
by: Ma, Xin, et al.
Published: (2025)

Cascading Refinement Video Denoising with Uncertainty Adaptivity
by: Yu, Xinyuan
Published: (2024)

Uni-Animator: Towards Unified Visual Colorization
by: Chen, Xinyuan, et al.
Published: (2026)

Bilateral Guided Radiance Field Processing
by: Wang, Yuehao, et al.
Published: (2024)

Harvest Video Foundation Models via Efficient Post-Pretraining
by: Li, Yizhuo, et al.
Published: (2023)