Saved in:
Bibliographic Details
Main Authors: Li, Mengtian, Lu, Yuwei, Li, Feifei, Gan, Chenqi, Xie, Zhifeng, Wang, Xi
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.02467
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908996721442816
author Li, Mengtian
Lu, Yuwei
Li, Feifei
Gan, Chenqi
Xie, Zhifeng
Wang, Xi
author_facet Li, Mengtian
Lu, Yuwei
Li, Feifei
Gan, Chenqi
Xie, Zhifeng
Wang, Xi
contents Cinematic camera control relies on a tight feedback loop between director and cinematographer, where camera motion and framing are continuously reviewed and refined. Recent generative camera systems can produce diverse, text-conditioned trajectories, but they lack this "director in the loop" and have no explicit supervision of whether a shot is visually desirable. This results in in-distribution camera motion but poor framing, off-screen characters, and undesirable visual aesthetics. In this paper, we introduce VERTIGO, the first framework for visual preference optimization of camera trajectory generators. Our framework leverages a real-time graphics engine (Unity) to render 2D visual previews from generated camera motion. A cinematically fine-tuned vision-language model then scores these previews using our proposed cyclic semantic similarity mechanism, which aligns renders with text prompts. This process provides the visual preference signals for Direct Preference Optimization (DPO) post-training. Both quantitative evaluations and user studies on Unity renders and diffusion-based Camera-to-Video pipelines show consistent gains in condition adherence, framing quality, and perceptual realism. Notably, VERTIGO reduces the character off-screen rate from 38% to nearly 0% while preserving the geometric fidelity of camera motion. User study participants further prefer VERTIGO over baselines across composition, consistency, prompt adherence, and aesthetic quality, confirming the perceptual benefits of our visual preference post-training.
format Preprint
id arxiv_https___arxiv_org_abs_2604_02467
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation
Li, Mengtian
Lu, Yuwei
Li, Feifei
Gan, Chenqi
Xie, Zhifeng
Wang, Xi
Computer Vision and Pattern Recognition
Artificial Intelligence
Cinematic camera control relies on a tight feedback loop between director and cinematographer, where camera motion and framing are continuously reviewed and refined. Recent generative camera systems can produce diverse, text-conditioned trajectories, but they lack this "director in the loop" and have no explicit supervision of whether a shot is visually desirable. This results in in-distribution camera motion but poor framing, off-screen characters, and undesirable visual aesthetics. In this paper, we introduce VERTIGO, the first framework for visual preference optimization of camera trajectory generators. Our framework leverages a real-time graphics engine (Unity) to render 2D visual previews from generated camera motion. A cinematically fine-tuned vision-language model then scores these previews using our proposed cyclic semantic similarity mechanism, which aligns renders with text prompts. This process provides the visual preference signals for Direct Preference Optimization (DPO) post-training. Both quantitative evaluations and user studies on Unity renders and diffusion-based Camera-to-Video pipelines show consistent gains in condition adherence, framing quality, and perceptual realism. Notably, VERTIGO reduces the character off-screen rate from 38% to nearly 0% while preserving the geometric fidelity of camera motion. User study participants further prefer VERTIGO over baselines across composition, consistency, prompt adherence, and aesthetic quality, confirming the perceptual benefits of our visual preference post-training.
title VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2604.02467