Saved in:
Bibliographic Details
Main Authors: Hauri, Yannick, Lanzendörfer, Luca A., Aczel, Till
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.00633
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918152362786816
author Hauri, Yannick
Lanzendörfer, Luca A.
Aczel, Till
author_facet Hauri, Yannick
Lanzendörfer, Luca A.
Aczel, Till
contents Fashion image generation has so far focused on narrow tasks such as virtual try-on, where garments appear in clean studio environments. In contrast, editorial fashion presents garments through dynamic poses, diverse locations, and carefully crafted visual narratives. We introduce the task of virtual fashion photo-shoot, which seeks to capture this richness by transforming standardized garment images into contextually grounded editorial imagery. To enable this new direction, we construct the first large-scale dataset of garment-lookbook pairs, bridging the gap between e-commerce and fashion media. Because such pairs are not readily available, we design an automated retrieval pipeline that aligns garments across domains, combining visual-language reasoning with object-level localization. We construct a dataset with three garment-lookbook pair accuracy levels: high quality (10,000 pairs), medium quality (50,000 pairs), and low quality (300,000 pairs). This dataset offers a foundation for models that move beyond catalog-style generation and toward fashion imagery that reflects creativity, atmosphere, and storytelling.
format Preprint
id arxiv_https___arxiv_org_abs_2510_00633
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Virtual Fashion Photo-Shoots: Building a Large-Scale Garment-Lookbook Dataset
Hauri, Yannick
Lanzendörfer, Luca A.
Aczel, Till
Computer Vision and Pattern Recognition
Machine Learning
Fashion image generation has so far focused on narrow tasks such as virtual try-on, where garments appear in clean studio environments. In contrast, editorial fashion presents garments through dynamic poses, diverse locations, and carefully crafted visual narratives. We introduce the task of virtual fashion photo-shoot, which seeks to capture this richness by transforming standardized garment images into contextually grounded editorial imagery. To enable this new direction, we construct the first large-scale dataset of garment-lookbook pairs, bridging the gap between e-commerce and fashion media. Because such pairs are not readily available, we design an automated retrieval pipeline that aligns garments across domains, combining visual-language reasoning with object-level localization. We construct a dataset with three garment-lookbook pair accuracy levels: high quality (10,000 pairs), medium quality (50,000 pairs), and low quality (300,000 pairs). This dataset offers a foundation for models that move beyond catalog-style generation and toward fashion imagery that reflects creativity, atmosphere, and storytelling.
title Virtual Fashion Photo-Shoots: Building a Large-Scale Garment-Lookbook Dataset
topic Computer Vision and Pattern Recognition
Machine Learning
url https://arxiv.org/abs/2510.00633