保存先:
書誌詳細
主要な著者: Choi, Jun Myeong, Yoon, Jae Shin, Qi, Luchao, Sengupta, Roni, Lee, Joon-Young
フォーマット: Preprint
出版事項: 2026
主題:
オンライン・アクセス:https://arxiv.org/abs/2605.28811
タグ: タグ追加
タグなし, このレコードへの初めてのタグを付けませんか!
_version_ 1866916056398823424
author Choi, Jun Myeong
Yoon, Jae Shin
Qi, Luchao
Sengupta, Roni
Lee, Joon-Young
author_facet Choi, Jun Myeong
Yoon, Jae Shin
Qi, Luchao
Sengupta, Roni
Lee, Joon-Young
contents We present a method for harmonizing the lighting of a foreground video to match a target background scene, adjusting shadows, color tone, and illumination intensity (relightful harmonization). Unlike images, acquiring labeled data for videos, where identical motions are recorded under different lighting conditions, is practically infeasible and non-scalable. While one way to create such paired data is to apply existing image-based harmonization models frame by frame to a video, the resulting outputs often suffer from significant temporal jitters. We overcome this problem by introducing a novel lighting deflickering model that can stabilize the global and local lighting flickering artifacts. Our video diffusion model learns from these upgraded deflickered data with a volume of real and synthetic videos to generate high-quality video harmonization results. We further propose an asymmetric alpha mask conditioning technique to learn the clean boundaries from real videos. Experiments demonstrate that our model achieves strong temporal coherence, naturalness, cleaner boundaries, and physically meaningful lighting behavior, while maintaining strong relighting expressiveness compared to prior image-based and video-based harmonization methods.
format Preprint
id arxiv_https___arxiv_org_abs_2605_28811
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle HarmoVid: Relightful Video Portrait Harmonization
Choi, Jun Myeong
Yoon, Jae Shin
Qi, Luchao
Sengupta, Roni
Lee, Joon-Young
Computer Vision and Pattern Recognition
We present a method for harmonizing the lighting of a foreground video to match a target background scene, adjusting shadows, color tone, and illumination intensity (relightful harmonization). Unlike images, acquiring labeled data for videos, where identical motions are recorded under different lighting conditions, is practically infeasible and non-scalable. While one way to create such paired data is to apply existing image-based harmonization models frame by frame to a video, the resulting outputs often suffer from significant temporal jitters. We overcome this problem by introducing a novel lighting deflickering model that can stabilize the global and local lighting flickering artifacts. Our video diffusion model learns from these upgraded deflickered data with a volume of real and synthetic videos to generate high-quality video harmonization results. We further propose an asymmetric alpha mask conditioning technique to learn the clean boundaries from real videos. Experiments demonstrate that our model achieves strong temporal coherence, naturalness, cleaner boundaries, and physically meaningful lighting behavior, while maintaining strong relighting expressiveness compared to prior image-based and video-based harmonization methods.
title HarmoVid: Relightful Video Portrait Harmonization
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2605.28811