Saved in:
Bibliographic Details
Main Authors: Shi, Zhihao, Yin, Kejia, Wan, Weilin, Zhou, Yuhongze, Yu, Yuanhao, Zuo, Xinxin, Sun, Qiang, Lu, Juwei
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.02256
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914364411346944
author Shi, Zhihao
Yin, Kejia
Wan, Weilin
Zhou, Yuhongze
Yu, Yuanhao
Zuo, Xinxin
Sun, Qiang
Lu, Juwei
author_facet Shi, Zhihao
Yin, Kejia
Wan, Weilin
Zhou, Yuhongze
Yu, Yuanhao
Zuo, Xinxin
Sun, Qiang
Lu, Juwei
contents Video (camera) trajectory editing aims to synthesize new videos that follow user-defined camera paths while preserving scene content and plausibly inpainting previously unseen regions, upgrading amateur footage into professionally styled videos. Existing VTE methods struggle with precise camera control and long-range consistency because they either inject target poses through a limited-capacity embedding or rely on single-frame warping with only implicit cross-frame aggregation in video diffusion models. To address these issues, we introduce a new VTE framework that 1) explicitly aggregates information across the entire source video via a hybrid warping scheme. Specifically, static regions are progressively fused into a world cache then rendered to target camera poses, while dynamic regions are directly warped; their fusion yields globally consistent coarse frames that guide refinement. 2) processes video segments jointly with their history via a history-guided autoregressive diffusion model, while the world cache is incrementally updated to reinforce already inpainted content, enabling long-term temporal coherence. Finally, we present iPhone-PTZ, a new VTE benchmark with diverse camera motions and large trajectory variations, and achieve state-of-the-art performance with fewer parameters.
format Preprint
id arxiv_https___arxiv_org_abs_2603_02256
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle CamDirector: Towards Long-Term Coherent Video Trajectory Editing
Shi, Zhihao
Yin, Kejia
Wan, Weilin
Zhou, Yuhongze
Yu, Yuanhao
Zuo, Xinxin
Sun, Qiang
Lu, Juwei
Computer Vision and Pattern Recognition
Video (camera) trajectory editing aims to synthesize new videos that follow user-defined camera paths while preserving scene content and plausibly inpainting previously unseen regions, upgrading amateur footage into professionally styled videos. Existing VTE methods struggle with precise camera control and long-range consistency because they either inject target poses through a limited-capacity embedding or rely on single-frame warping with only implicit cross-frame aggregation in video diffusion models. To address these issues, we introduce a new VTE framework that 1) explicitly aggregates information across the entire source video via a hybrid warping scheme. Specifically, static regions are progressively fused into a world cache then rendered to target camera poses, while dynamic regions are directly warped; their fusion yields globally consistent coarse frames that guide refinement. 2) processes video segments jointly with their history via a history-guided autoregressive diffusion model, while the world cache is incrementally updated to reinforce already inpainted content, enabling long-term temporal coherence. Finally, we present iPhone-PTZ, a new VTE benchmark with diverse camera motions and large trajectory variations, and achieve state-of-the-art performance with fewer parameters.
title CamDirector: Towards Long-Term Coherent Video Trajectory Editing
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2603.02256