Saved in:
Bibliographic Details
Main Authors: Li, Siqi, Jiang, Zhengkai, Zhou, Jiawei, Liu, Zhihong, Chi, Xiaowei, Wang, Haoqian
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2501.08682
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912269071286272
author Li, Siqi
Jiang, Zhengkai
Zhou, Jiawei
Liu, Zhihong
Chi, Xiaowei
Wang, Haoqian
author_facet Li, Siqi
Jiang, Zhengkai
Zhou, Jiawei
Liu, Zhihong
Chi, Xiaowei
Wang, Haoqian
contents Virtual try-on has emerged as a pivotal task at the intersection of computer vision and fashion, aimed at digitally simulating how clothing items fit on the human body. Despite notable progress in single-image virtual try-on (VTO), current methodologies often struggle to preserve a consistent and authentic appearance of clothing across extended video sequences. This challenge arises from the complexities of capturing dynamic human pose and maintaining target clothing characteristics. We leverage pre-existing video foundation models to introduce RealVVT, a photoRealistic Video Virtual Try-on framework tailored to bolster stability and realism within dynamic video contexts. Our methodology encompasses a Clothing & Temporal Consistency strategy, an Agnostic-guided Attention Focus Loss mechanism to ensure spatial consistency, and a Pose-guided Long Video VTO technique adept at handling extended video sequences.Extensive experiments across various datasets confirms that our approach outperforms existing state-of-the-art models in both single-image and video VTO tasks, offering a viable solution for practical applications within the realms of fashion e-commerce and virtual fitting environments.
format Preprint
id arxiv_https___arxiv_org_abs_2501_08682
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency
Li, Siqi
Jiang, Zhengkai
Zhou, Jiawei
Liu, Zhihong
Chi, Xiaowei
Wang, Haoqian
Computer Vision and Pattern Recognition
Graphics
68T99
Virtual try-on has emerged as a pivotal task at the intersection of computer vision and fashion, aimed at digitally simulating how clothing items fit on the human body. Despite notable progress in single-image virtual try-on (VTO), current methodologies often struggle to preserve a consistent and authentic appearance of clothing across extended video sequences. This challenge arises from the complexities of capturing dynamic human pose and maintaining target clothing characteristics. We leverage pre-existing video foundation models to introduce RealVVT, a photoRealistic Video Virtual Try-on framework tailored to bolster stability and realism within dynamic video contexts. Our methodology encompasses a Clothing & Temporal Consistency strategy, an Agnostic-guided Attention Focus Loss mechanism to ensure spatial consistency, and a Pose-guided Long Video VTO technique adept at handling extended video sequences.Extensive experiments across various datasets confirms that our approach outperforms existing state-of-the-art models in both single-image and video VTO tasks, offering a viable solution for practical applications within the realms of fashion e-commerce and virtual fitting environments.
title RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency
topic Computer Vision and Pattern Recognition
Graphics
68T99
url https://arxiv.org/abs/2501.08682