:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xiao, Xinyu, Yang, Binbin, Li, Tingtian, Yu, Yipeng, Lei, Sen
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Multimedia
Online Access:	https://arxiv.org/abs/2603.13739
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions
by: Qin, Bosheng, et al.
Published: (2023)

VidCtx: Context-aware Video Question Answering with Image Models
by: Goulas, Andreas, et al.
Published: (2024)

CounterVid: Counterfactual Video Generation for Mitigating Action and Temporal Hallucinations in Video-Language Models
by: Poppi, Tobia, et al.
Published: (2026)

UniForm: A Unified Multi-Task Diffusion Transformer for Audio-Video Generation
by: Zhao, Lei, et al.
Published: (2025)

VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation
by: Zheng, Sixiao, et al.
Published: (2025)

Lumos-1: On Autoregressive Video Generation with Discrete Diffusion from a Unified Model Perspective
by: Yuan, Hangjie, et al.
Published: (2025)

MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
by: Zhang, Yuang, et al.
Published: (2024)

UniVid: The Open-Source Unified Video Model
by: Luo, Jiabin, et al.
Published: (2025)

Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
by: Yan, Xin, et al.
Published: (2024)

Diffusion Models for Joint Audio-Video Generation
by: La Torre, Alejandro Paredes
Published: (2026)

SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation
by: Chen, Tong, et al.
Published: (2024)

Bernini: Latent Semantic Planning for Video Diffusion
by: Bernini Team, et al.
Published: (2026)

HOIN: High-Order Implicit Neural Representations
by: Chen, Yang, et al.
Published: (2024)

Moiré Video Authentication: A Physical Signature Against AI Video Generation
by: Qing, Yuan, et al.
Published: (2026)

EQ-TAA: Equivariant Traffic Accident Anticipation via Diffusion-Based Accident Video Synthesis
by: Fang, Jianwu, et al.
Published: (2025)

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
by: Yu, Lijun, et al.
Published: (2023)

DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
by: Wu, Hao, et al.
Published: (2024)

Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation
by: Cao, Pu, et al.
Published: (2023)

Terrain Diffusion Network: Climatic-Aware Terrain Generation with Geological Sketch Guidance
by: Hu, Zexin, et al.
Published: (2023)

ReCorD: Reasoning and Correcting Diffusion for HOI Generation
by: Jiang-Lin, Jian-Yu, et al.
Published: (2024)

UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models
by: Chen, Lan, et al.
Published: (2025)

IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation
by: Lin, Yuanze, et al.
Published: (2025)

Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learning
by: Chen, Weifeng, et al.
Published: (2023)

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
by: Cai, Minghong, et al.
Published: (2024)

Question-Answering Dense Video Events
by: Qin, Hangyu, et al.
Published: (2024)

UniF$^2$ace: A Unified Fine-grained Face Understanding and Generation Model
by: Li, Junzhe, et al.
Published: (2025)

StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation
by: Wu, Yi, et al.
Published: (2025)

Spotlighting Partially Visible Cinematic Language for Video-to-Audio Generation via Self-distillation
by: Huang, Feizhen, et al.
Published: (2025)

AVBench: Human-Aligned and Automated Evaluation Benchmark for Audio-Video Generative Models
by: Yang, Jialiang, et al.
Published: (2026)

MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering
by: Xiao, Junbin, et al.
Published: (2026)

OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation
by: Gan, Qijun, et al.
Published: (2025)

High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model
by: Zhong, Weizhi, et al.
Published: (2024)

Test-Time Self-Adaptive Conditioning for Stable Audio-Driven Talking-Head Generation
by: Zhang, Zhicheng, et al.
Published: (2026)

PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing
by: Huang, Wenjing, et al.
Published: (2023)

How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment
by: Chen, Zhen, et al.
Published: (2025)

Interactive Video Generation via Domain Adaptation
by: Rawal, Ishaan, et al.
Published: (2025)

A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming
by: Zhou, Pengyuan, et al.
Published: (2024)

Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval
by: Jiang, Chen, et al.
Published: (2023)

Towards Multi-Task Multi-Modal Models: A Video Generative Perspective
by: Yu, Lijun
Published: (2024)

LPM 1.0: Video-based Character Performance Model
by: Zeng, Ailing, et al.
Published: (2026)