Saved in:
| Main Authors: | Yang, Songlin, Wang, Zhe, Yang, Xuyi, Zhang, Songchun, Kong, Xianghao, Wu, Taiyi, Zhao, Xiaotong, Zhang, Ran, Zhao, Alan, Rao, Anyi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.11421 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Composing Concepts from Images and Videos via Concept-prompt Binding
by: Kong, Xianghao, et al.
Published: (2025)
by: Kong, Xianghao, et al.
Published: (2025)
EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation
by: Yang, Songlin, et al.
Published: (2026)
by: Yang, Songlin, et al.
Published: (2026)
Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models
by: Yang, Songlin, et al.
Published: (2026)
by: Yang, Songlin, et al.
Published: (2026)
Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models
by: Zhang, Songchun, et al.
Published: (2026)
by: Zhang, Songchun, et al.
Published: (2026)
SesaHand: Enhancing 3D Hand Reconstruction via Controllable Generation with Semantic and Structural Alignment
by: Zhao, Zhuoran, et al.
Published: (2026)
by: Zhao, Zhuoran, et al.
Published: (2026)
ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images
by: Kong, Xianghao, et al.
Published: (2025)
by: Kong, Xianghao, et al.
Published: (2025)
WorldCraft: From Camera Navigation to Object Manipulation in Interactive Video World Models
by: Gu, Bohai, et al.
Published: (2026)
by: Gu, Bohai, et al.
Published: (2026)
Taming Flow-based I2V Models for Creative Video Editing
by: Kong, Xianghao, et al.
Published: (2025)
by: Kong, Xianghao, et al.
Published: (2025)
Place-it-R1: Unlocking Environment-aware Reasoning Potential of MLLM for Video Object Insertion
by: Gu, Bohai, et al.
Published: (2026)
by: Gu, Bohai, et al.
Published: (2026)
Pragmatist: Multiview Conditional Diffusion Models for High-Fidelity 3D Reconstruction from Unposed Sparse Views
by: Zhang, Songchun, et al.
Published: (2024)
by: Zhang, Songchun, et al.
Published: (2024)
Taming Video Models for 3D and 4D Generation via Zero-Shot Camera Control
by: Song, Chenxi, et al.
Published: (2025)
by: Song, Chenxi, et al.
Published: (2025)
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation
by: Xing, Jinbo, et al.
Published: (2025)
by: Xing, Jinbo, et al.
Published: (2025)
Liberating Seen Classes: Boosting Few-Shot and Zero-Shot Text Classification via Anchor Generation and Classification Reframing
by: Liu, Han, et al.
Published: (2024)
by: Liu, Han, et al.
Published: (2024)
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
by: Liu, Hongbo, et al.
Published: (2025)
by: Liu, Hongbo, et al.
Published: (2025)
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
by: Chen, Houyuan, et al.
Published: (2026)
by: Chen, Houyuan, et al.
Published: (2026)
HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
by: Meng, Yihao, et al.
Published: (2025)
by: Meng, Yihao, et al.
Published: (2025)
FunCineForge: A Unified Dataset Toolkit and Model for Zero-Shot Movie Dubbing in Diverse Cinematic Scenes
by: Liu, Jiaxuan, et al.
Published: (2026)
by: Liu, Jiaxuan, et al.
Published: (2026)
PAI-Studio: Cinematic Video Background Replacement with Camera-Aware Motion
by: Gao, Heyuan, et al.
Published: (2026)
by: Gao, Heyuan, et al.
Published: (2026)
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
by: Bak, Taejun, et al.
Published: (2024)
by: Bak, Taejun, et al.
Published: (2024)
MONA: Moving Object Detection from Videos Shot by Dynamic Camera
by: Hu, Boxun, et al.
Published: (2025)
by: Hu, Boxun, et al.
Published: (2025)
CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition
by: Phung, Quynh, et al.
Published: (2025)
by: Phung, Quynh, et al.
Published: (2025)
A timespace of zero‐COVID in Southwest China: Building community, governing time
by: Xuyi Zhao
Published: (2024)
by: Xuyi Zhao
Published: (2024)
Wan-S2V: Audio-Driven Cinematic Video Generation
by: Gao, Xin, et al.
Published: (2025)
by: Gao, Xin, et al.
Published: (2025)
SKALD: Learning-Based Shot Assembly for Coherent Multi-Shot Video Creation
by: Lu, Chen Yi, et al.
Published: (2025)
by: Lu, Chen Yi, et al.
Published: (2025)
ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search
by: Yu, Tao, et al.
Published: (2026)
by: Yu, Tao, et al.
Published: (2026)
MuSS: A Large-Scale Dataset and Cinematic Narrative Benchmark for Multi-Shot Subject-to-Video Generation
by: Zhang, Haojie, et al.
Published: (2026)
by: Zhang, Haojie, et al.
Published: (2026)
Dense Semantic Matching with VGGT Prior
by: Yang, Songlin, et al.
Published: (2025)
by: Yang, Songlin, et al.
Published: (2025)
Generative AI for Film Creation: A Survey of Recent Advances
by: Zhang, Ruihan, et al.
Published: (2025)
by: Zhang, Ruihan, et al.
Published: (2025)
SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner
by: Zhou, Yufan, et al.
Published: (2024)
by: Zhou, Yufan, et al.
Published: (2024)
CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition
by: Yang, Hongji, et al.
Published: (2026)
by: Yang, Hongji, et al.
Published: (2026)
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
by: Zhang, Yiming, et al.
Published: (2024)
by: Zhang, Yiming, et al.
Published: (2024)
Pre-Training and Prompting for Few-Shot Node Classification on Text-Attributed Graphs
by: Zhao, Huanjing, et al.
Published: (2024)
by: Zhao, Huanjing, et al.
Published: (2024)
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models
by: Kara, Ozgur, et al.
Published: (2025)
by: Kara, Ozgur, et al.
Published: (2025)
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
by: Wei, Yujie, et al.
Published: (2024)
by: Wei, Yujie, et al.
Published: (2024)
Are Image-to-Video Models Good Zero-Shot Image Editors?
by: Zhang, Zechuan, et al.
Published: (2025)
by: Zhang, Zechuan, et al.
Published: (2025)
Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
by: Ren, Yixuan, et al.
Published: (2024)
by: Ren, Yixuan, et al.
Published: (2024)
EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning
by: Ju, Xuan, et al.
Published: (2025)
by: Ju, Xuan, et al.
Published: (2025)
CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion
by: Chen, Yiran, et al.
Published: (2024)
by: Chen, Yiran, et al.
Published: (2024)
VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing
by: Couairon, Paul, et al.
Published: (2023)
by: Couairon, Paul, et al.
Published: (2023)
Shot-Aware Frame Sampling for Video Understanding
by: Zhao, Mengyu, et al.
Published: (2026)
by: Zhao, Mengyu, et al.
Published: (2026)
Similar Items
-
Composing Concepts from Images and Videos via Concept-prompt Binding
by: Kong, Xianghao, et al.
Published: (2025) -
EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation
by: Yang, Songlin, et al.
Published: (2026) -
Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models
by: Yang, Songlin, et al.
Published: (2026) -
Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models
by: Zhang, Songchun, et al.
Published: (2026) -
SesaHand: Enhancing 3D Hand Reconstruction via Controllable Generation with Semantic and Structural Alignment
by: Zhao, Zhuoran, et al.
Published: (2026)