Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.02256 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866914364411346944 |
|---|---|
| author | Shi, Zhihao Yin, Kejia Wan, Weilin Zhou, Yuhongze Yu, Yuanhao Zuo, Xinxin Sun, Qiang Lu, Juwei |
| author_facet | Shi, Zhihao Yin, Kejia Wan, Weilin Zhou, Yuhongze Yu, Yuanhao Zuo, Xinxin Sun, Qiang Lu, Juwei |
| contents | Video (camera) trajectory editing aims to synthesize new videos that follow user-defined camera paths while preserving scene content and plausibly inpainting previously unseen regions, upgrading amateur footage into professionally styled videos. Existing VTE methods struggle with precise camera control and long-range consistency because they either inject target poses through a limited-capacity embedding or rely on single-frame warping with only implicit cross-frame aggregation in video diffusion models. To address these issues, we introduce a new VTE framework that 1) explicitly aggregates information across the entire source video via a hybrid warping scheme. Specifically, static regions are progressively fused into a world cache then rendered to target camera poses, while dynamic regions are directly warped; their fusion yields globally consistent coarse frames that guide refinement. 2) processes video segments jointly with their history via a history-guided autoregressive diffusion model, while the world cache is incrementally updated to reinforce already inpainted content, enabling long-term temporal coherence. Finally, we present iPhone-PTZ, a new VTE benchmark with diverse camera motions and large trajectory variations, and achieve state-of-the-art performance with fewer parameters. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2603_02256 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | CamDirector: Towards Long-Term Coherent Video Trajectory Editing Shi, Zhihao Yin, Kejia Wan, Weilin Zhou, Yuhongze Yu, Yuanhao Zuo, Xinxin Sun, Qiang Lu, Juwei Computer Vision and Pattern Recognition Video (camera) trajectory editing aims to synthesize new videos that follow user-defined camera paths while preserving scene content and plausibly inpainting previously unseen regions, upgrading amateur footage into professionally styled videos. Existing VTE methods struggle with precise camera control and long-range consistency because they either inject target poses through a limited-capacity embedding or rely on single-frame warping with only implicit cross-frame aggregation in video diffusion models. To address these issues, we introduce a new VTE framework that 1) explicitly aggregates information across the entire source video via a hybrid warping scheme. Specifically, static regions are progressively fused into a world cache then rendered to target camera poses, while dynamic regions are directly warped; their fusion yields globally consistent coarse frames that guide refinement. 2) processes video segments jointly with their history via a history-guided autoregressive diffusion model, while the world cache is incrementally updated to reinforce already inpainted content, enabling long-term temporal coherence. Finally, we present iPhone-PTZ, a new VTE benchmark with diverse camera motions and large trajectory variations, and achieve state-of-the-art performance with fewer parameters. |
| title | CamDirector: Towards Long-Term Coherent Video Trajectory Editing |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2603.02256 |