Saved in:
| Main Authors: | Xiao, Xinyu, Yang, Binbin, Li, Tingtian, Yu, Yipeng, Lei, Sen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.13739 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
InstructVid2Vid: Controllable Video Editing with Natural Language Instructions
by: Qin, Bosheng, et al.
Published: (2023)
by: Qin, Bosheng, et al.
Published: (2023)
VidCtx: Context-aware Video Question Answering with Image Models
by: Goulas, Andreas, et al.
Published: (2024)
by: Goulas, Andreas, et al.
Published: (2024)
CounterVid: Counterfactual Video Generation for Mitigating Action and Temporal Hallucinations in Video-Language Models
by: Poppi, Tobia, et al.
Published: (2026)
by: Poppi, Tobia, et al.
Published: (2026)
UniForm: A Unified Multi-Task Diffusion Transformer for Audio-Video Generation
by: Zhao, Lei, et al.
Published: (2025)
by: Zhao, Lei, et al.
Published: (2025)
VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation
by: Zheng, Sixiao, et al.
Published: (2025)
by: Zheng, Sixiao, et al.
Published: (2025)
Lumos-1: On Autoregressive Video Generation with Discrete Diffusion from a Unified Model Perspective
by: Yuan, Hangjie, et al.
Published: (2025)
by: Yuan, Hangjie, et al.
Published: (2025)
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
by: Zhang, Yuang, et al.
Published: (2024)
by: Zhang, Yuang, et al.
Published: (2024)
UniVid: The Open-Source Unified Video Model
by: Luo, Jiabin, et al.
Published: (2025)
by: Luo, Jiabin, et al.
Published: (2025)
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
by: Yan, Xin, et al.
Published: (2024)
by: Yan, Xin, et al.
Published: (2024)
Diffusion Models for Joint Audio-Video Generation
by: La Torre, Alejandro Paredes
Published: (2026)
by: La Torre, Alejandro Paredes
Published: (2026)
SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation
by: Chen, Tong, et al.
Published: (2024)
by: Chen, Tong, et al.
Published: (2024)
Bernini: Latent Semantic Planning for Video Diffusion
by: Bernini Team, et al.
Published: (2026)
by: Bernini Team, et al.
Published: (2026)
HOIN: High-Order Implicit Neural Representations
by: Chen, Yang, et al.
Published: (2024)
by: Chen, Yang, et al.
Published: (2024)
Moiré Video Authentication: A Physical Signature Against AI Video Generation
by: Qing, Yuan, et al.
Published: (2026)
by: Qing, Yuan, et al.
Published: (2026)
EQ-TAA: Equivariant Traffic Accident Anticipation via Diffusion-Based Accident Video Synthesis
by: Fang, Jianwu, et al.
Published: (2025)
by: Fang, Jianwu, et al.
Published: (2025)
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
by: Yu, Lijun, et al.
Published: (2023)
by: Yu, Lijun, et al.
Published: (2023)
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
by: Wu, Hao, et al.
Published: (2024)
by: Wu, Hao, et al.
Published: (2024)
Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation
by: Cao, Pu, et al.
Published: (2023)
by: Cao, Pu, et al.
Published: (2023)
Terrain Diffusion Network: Climatic-Aware Terrain Generation with Geological Sketch Guidance
by: Hu, Zexin, et al.
Published: (2023)
by: Hu, Zexin, et al.
Published: (2023)
ReCorD: Reasoning and Correcting Diffusion for HOI Generation
by: Jiang-Lin, Jian-Yu, et al.
Published: (2024)
by: Jiang-Lin, Jian-Yu, et al.
Published: (2024)
UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models
by: Chen, Lan, et al.
Published: (2025)
by: Chen, Lan, et al.
Published: (2025)
IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation
by: Lin, Yuanze, et al.
Published: (2025)
by: Lin, Yuanze, et al.
Published: (2025)
Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learning
by: Chen, Weifeng, et al.
Published: (2023)
by: Chen, Weifeng, et al.
Published: (2023)
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
by: Cai, Minghong, et al.
Published: (2024)
by: Cai, Minghong, et al.
Published: (2024)
Question-Answering Dense Video Events
by: Qin, Hangyu, et al.
Published: (2024)
by: Qin, Hangyu, et al.
Published: (2024)
UniF$^2$ace: A Unified Fine-grained Face Understanding and Generation Model
by: Li, Junzhe, et al.
Published: (2025)
by: Li, Junzhe, et al.
Published: (2025)
StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation
by: Wu, Yi, et al.
Published: (2025)
by: Wu, Yi, et al.
Published: (2025)
Spotlighting Partially Visible Cinematic Language for Video-to-Audio Generation via Self-distillation
by: Huang, Feizhen, et al.
Published: (2025)
by: Huang, Feizhen, et al.
Published: (2025)
AVBench: Human-Aligned and Automated Evaluation Benchmark for Audio-Video Generative Models
by: Yang, Jialiang, et al.
Published: (2026)
by: Yang, Jialiang, et al.
Published: (2026)
MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering
by: Xiao, Junbin, et al.
Published: (2026)
by: Xiao, Junbin, et al.
Published: (2026)
OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation
by: Gan, Qijun, et al.
Published: (2025)
by: Gan, Qijun, et al.
Published: (2025)
High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model
by: Zhong, Weizhi, et al.
Published: (2024)
by: Zhong, Weizhi, et al.
Published: (2024)
Test-Time Self-Adaptive Conditioning for Stable Audio-Driven Talking-Head Generation
by: Zhang, Zhicheng, et al.
Published: (2026)
by: Zhang, Zhicheng, et al.
Published: (2026)
PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing
by: Huang, Wenjing, et al.
Published: (2023)
by: Huang, Wenjing, et al.
Published: (2023)
How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment
by: Chen, Zhen, et al.
Published: (2025)
by: Chen, Zhen, et al.
Published: (2025)
Interactive Video Generation via Domain Adaptation
by: Rawal, Ishaan, et al.
Published: (2025)
by: Rawal, Ishaan, et al.
Published: (2025)
A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming
by: Zhou, Pengyuan, et al.
Published: (2024)
by: Zhou, Pengyuan, et al.
Published: (2024)
Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval
by: Jiang, Chen, et al.
Published: (2023)
by: Jiang, Chen, et al.
Published: (2023)
Towards Multi-Task Multi-Modal Models: A Video Generative Perspective
by: Yu, Lijun
Published: (2024)
by: Yu, Lijun
Published: (2024)
LPM 1.0: Video-based Character Performance Model
by: Zeng, Ailing, et al.
Published: (2026)
by: Zeng, Ailing, et al.
Published: (2026)
Similar Items
-
InstructVid2Vid: Controllable Video Editing with Natural Language Instructions
by: Qin, Bosheng, et al.
Published: (2023) -
VidCtx: Context-aware Video Question Answering with Image Models
by: Goulas, Andreas, et al.
Published: (2024) -
CounterVid: Counterfactual Video Generation for Mitigating Action and Temporal Hallucinations in Video-Language Models
by: Poppi, Tobia, et al.
Published: (2026) -
UniForm: A Unified Multi-Task Diffusion Transformer for Audio-Video Generation
by: Zhao, Lei, et al.
Published: (2025) -
VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation
by: Zheng, Sixiao, et al.
Published: (2025)