Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xu, Zhengze, Chen, Mengting, Wang, Zhao, Xing, Linyu, Zhai, Zhonghua, Sang, Nong, Lan, Jinsong, Xiao, Shuai, Gao, Changxin
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2404.17571
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914772099792896
author	Xu, Zhengze Chen, Mengting Wang, Zhao Xing, Linyu Zhai, Zhonghua Sang, Nong Lan, Jinsong Xiao, Shuai Gao, Changxin
author_facet	Xu, Zhengze Chen, Mengting Wang, Zhao Xing, Linyu Zhai, Zhonghua Sang, Nong Lan, Jinsong Xiao, Shuai Gao, Changxin
contents	Video try-on is a challenging task and has not been well tackled in previous works. The main obstacle lies in preserving the details of the clothing and modeling the coherent motions simultaneously. Faced with those difficulties, we address video try-on by proposing a diffusion-based framework named "Tunnel Try-on." The core idea is excavating a "focus tunnel" in the input video that gives close-up shots around the clothing regions. We zoom in on the region in the tunnel to better preserve the fine details of the clothing. To generate coherent motions, we first leverage the Kalman filter to construct smooth crops in the focus tunnel and inject the position embedding of the tunnel into attention layers to improve the continuity of the generated videos. In addition, we develop an environment encoder to extract the context information outside the tunnels as supplementary cues. Equipped with these techniques, Tunnel Try-on keeps the fine details of the clothing and synthesizes stable and smooth videos. Demonstrating significant advancements, Tunnel Try-on could be regarded as the first attempt toward the commercial-level application of virtual try-on in videos.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_17571
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos Xu, Zhengze Chen, Mengting Wang, Zhao Xing, Linyu Zhai, Zhonghua Sang, Nong Lan, Jinsong Xiao, Shuai Gao, Changxin Computer Vision and Pattern Recognition Video try-on is a challenging task and has not been well tackled in previous works. The main obstacle lies in preserving the details of the clothing and modeling the coherent motions simultaneously. Faced with those difficulties, we address video try-on by proposing a diffusion-based framework named "Tunnel Try-on." The core idea is excavating a "focus tunnel" in the input video that gives close-up shots around the clothing regions. We zoom in on the region in the tunnel to better preserve the fine details of the clothing. To generate coherent motions, we first leverage the Kalman filter to construct smooth crops in the focus tunnel and inject the position embedding of the tunnel into attention layers to improve the continuity of the generated videos. In addition, we develop an environment encoder to extract the context information outside the tunnels as supplementary cues. Equipped with these techniques, Tunnel Try-on keeps the fine details of the clothing and synthesizes stable and smooth videos. Demonstrating significant advancements, Tunnel Try-on could be regarded as the first attempt toward the commercial-level application of virtual try-on in videos.
title	Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2404.17571

Similar Items