Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.08682 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866912269071286272 |
|---|---|
| author | Li, Siqi Jiang, Zhengkai Zhou, Jiawei Liu, Zhihong Chi, Xiaowei Wang, Haoqian |
| author_facet | Li, Siqi Jiang, Zhengkai Zhou, Jiawei Liu, Zhihong Chi, Xiaowei Wang, Haoqian |
| contents | Virtual try-on has emerged as a pivotal task at the intersection of computer vision and fashion, aimed at digitally simulating how clothing items fit on the human body. Despite notable progress in single-image virtual try-on (VTO), current methodologies often struggle to preserve a consistent and authentic appearance of clothing across extended video sequences. This challenge arises from the complexities of capturing dynamic human pose and maintaining target clothing characteristics. We leverage pre-existing video foundation models to introduce RealVVT, a photoRealistic Video Virtual Try-on framework tailored to bolster stability and realism within dynamic video contexts. Our methodology encompasses a Clothing & Temporal Consistency strategy, an Agnostic-guided Attention Focus Loss mechanism to ensure spatial consistency, and a Pose-guided Long Video VTO technique adept at handling extended video sequences.Extensive experiments across various datasets confirms that our approach outperforms existing state-of-the-art models in both single-image and video VTO tasks, offering a viable solution for practical applications within the realms of fashion e-commerce and virtual fitting environments. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2501_08682 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency Li, Siqi Jiang, Zhengkai Zhou, Jiawei Liu, Zhihong Chi, Xiaowei Wang, Haoqian Computer Vision and Pattern Recognition Graphics 68T99 Virtual try-on has emerged as a pivotal task at the intersection of computer vision and fashion, aimed at digitally simulating how clothing items fit on the human body. Despite notable progress in single-image virtual try-on (VTO), current methodologies often struggle to preserve a consistent and authentic appearance of clothing across extended video sequences. This challenge arises from the complexities of capturing dynamic human pose and maintaining target clothing characteristics. We leverage pre-existing video foundation models to introduce RealVVT, a photoRealistic Video Virtual Try-on framework tailored to bolster stability and realism within dynamic video contexts. Our methodology encompasses a Clothing & Temporal Consistency strategy, an Agnostic-guided Attention Focus Loss mechanism to ensure spatial consistency, and a Pose-guided Long Video VTO technique adept at handling extended video sequences.Extensive experiments across various datasets confirms that our approach outperforms existing state-of-the-art models in both single-image and video VTO tasks, offering a viable solution for practical applications within the realms of fashion e-commerce and virtual fitting environments. |
| title | RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency |
| topic | Computer Vision and Pattern Recognition Graphics 68T99 |
| url | https://arxiv.org/abs/2501.08682 |