:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Songlin, Wang, Zhe, Yang, Xuyi, Zhang, Songchun, Kong, Xianghao, Wu, Taiyi, Zhao, Xiaotong, Zhang, Ran, Zhao, Alan, Rao, Anyi
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.11421
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Composing Concepts from Images and Videos via Concept-prompt Binding
by: Kong, Xianghao, et al.
Published: (2025)

EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation
by: Yang, Songlin, et al.
Published: (2026)

Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models
by: Yang, Songlin, et al.
Published: (2026)

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models
by: Zhang, Songchun, et al.
Published: (2026)

SesaHand: Enhancing 3D Hand Reconstruction via Controllable Generation with Semantic and Structural Alignment
by: Zhao, Zhuoran, et al.
Published: (2026)

ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images
by: Kong, Xianghao, et al.
Published: (2025)

WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models
by: Gu, Bohai, et al.
Published: (2026)

Taming Flow-based I2V Models for Creative Video Editing
by: Kong, Xianghao, et al.
Published: (2025)

Place-it-R1: Unlocking Environment-aware Reasoning Potential of MLLM for Video Object Insertion
by: Gu, Bohai, et al.
Published: (2026)

Pragmatist: Multiview Conditional Diffusion Models for High-Fidelity 3D Reconstruction from Unposed Sparse Views
by: Zhang, Songchun, et al.
Published: (2024)

Taming Video Models for 3D and 4D Generation via Zero-Shot Camera Control
by: Song, Chenxi, et al.
Published: (2025)

MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation
by: Xing, Jinbo, et al.
Published: (2025)

Liberating Seen Classes: Boosting Few-Shot and Zero-Shot Text Classification via Anchor Generation and Classification Reframing
by: Liu, Han, et al.
Published: (2024)

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
by: Liu, Hongbo, et al.
Published: (2025)

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
by: Chen, Houyuan, et al.
Published: (2026)

HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
by: Meng, Yihao, et al.
Published: (2025)

FunCineForge: A Unified Dataset Toolkit and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes
by: Liu, Jiaxuan, et al.
Published: (2026)

PAI-Studio: Cinematic Video Background Replacement with Camera-Aware Motion
by: Gao, Heyuan, et al.
Published: (2026)

MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
by: Bak, Taejun, et al.
Published: (2024)

MONA: Moving Object Detection from Videos Shot by Dynamic Camera
by: Hu, Boxun, et al.
Published: (2025)

CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition
by: Phung, Quynh, et al.
Published: (2025)

A timespace of zero‐COVID in Southwest China: Building community, governing time
by: Xuyi Zhao
Published: (2024)

Wan-S2V: Audio-Driven Cinematic Video Generation
by: Gao, Xin, et al.
Published: (2025)

SKALD: Learning-Based Shot Assembly for Coherent Multi-Shot Video Creation
by: Lu, Chen Yi, et al.
Published: (2025)

ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search
by: Yu, Tao, et al.
Published: (2026)

MuSS: A Large-Scale Dataset and Cinematic Narrative Benchmark for Multi-Shot Subject-to-Video Generation
by: Zhang, Haojie, et al.
Published: (2026)

Dense Semantic Matching with VGGT Prior
by: Yang, Songlin, et al.
Published: (2025)

Generative AI for Film Creation: A Survey of Recent Advances
by: Zhang, Ruihan, et al.
Published: (2025)

SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner
by: Zhou, Yufan, et al.
Published: (2024)

CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition
by: Yang, Hongji, et al.
Published: (2026)

Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
by: Zhang, Yiming, et al.
Published: (2024)

Pre-Training and Prompting for Few-Shot Node Classification on Text-Attributed Graphs
by: Zhao, Huanjing, et al.
Published: (2024)

ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models
by: Kara, Ozgur, et al.
Published: (2025)

DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
by: Wei, Yujie, et al.
Published: (2024)

Are Image-to-Video Models Good Zero-Shot Image Editors?
by: Zhang, Zechuan, et al.
Published: (2025)

Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
by: Ren, Yixuan, et al.
Published: (2024)

EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning
by: Ju, Xuan, et al.
Published: (2025)

CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion
by: Chen, Yiran, et al.
Published: (2024)

VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing
by: Couairon, Paul, et al.
Published: (2023)

Shot-Aware Frame Sampling for Video Understanding
by: Zhao, Mengyu, et al.
Published: (2026)