Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sanguigni, Fulvio, Lobba, Davide, Ren, Bin, Cornia, Marcella, Sebe, Nicu, Cucchiara, Rita
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.22607
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917405811277824
author	Sanguigni, Fulvio Lobba, Davide Ren, Bin Cornia, Marcella Sebe, Nicu Cucchiara, Rita
author_facet	Sanguigni, Fulvio Lobba, Davide Ren, Bin Cornia, Marcella Sebe, Nicu Cucchiara, Rita
contents	Recent advances in Virtual Try-On (VTON) and Virtual Try-Off (VTOFF) have greatly improved photo-realistic fashion synthesis and garment reconstruction. However, existing datasets remain static, lacking instruction-driven editing for controllable and interactive fashion generation. In this work, we introduce the Dress Editing Dataset (Dress-ED), the first large-scale benchmark that unifies VTON, VTOFF, and text-guided garment editing within a single framework. Each sample in Dress-ED includes an in-shop garment image, the corresponding person image wearing the garment, their edited counterparts, and a natural-language instruction of the desired modification. Built through a fully automated multimodal pipeline that integrates MLLM-based garment understanding, diffusion-based editing, and LLM-guided verification, Dress-ED comprises over 146k verified quadruplets spanning three garment categories and seven edit types, including both appearance (e.g., color, pattern, material) and structural (e.g., sleeve length, neckline) modifications. Based on this benchmark, we further propose a unified multimodal diffusion framework that jointly reasons over linguistic instructions and visual garment cues, serving as a strong baseline for instruction-driven VTON and VTOFF. Dataset and code will be made publicly available. Project page: https://furio1999.github.io/Dress-ED/
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_22607
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Dress-ED: Instruction-Guided Editing for Virtual Try-On and Try-Off Sanguigni, Fulvio Lobba, Davide Ren, Bin Cornia, Marcella Sebe, Nicu Cucchiara, Rita Computer Vision and Pattern Recognition Recent advances in Virtual Try-On (VTON) and Virtual Try-Off (VTOFF) have greatly improved photo-realistic fashion synthesis and garment reconstruction. However, existing datasets remain static, lacking instruction-driven editing for controllable and interactive fashion generation. In this work, we introduce the Dress Editing Dataset (Dress-ED), the first large-scale benchmark that unifies VTON, VTOFF, and text-guided garment editing within a single framework. Each sample in Dress-ED includes an in-shop garment image, the corresponding person image wearing the garment, their edited counterparts, and a natural-language instruction of the desired modification. Built through a fully automated multimodal pipeline that integrates MLLM-based garment understanding, diffusion-based editing, and LLM-guided verification, Dress-ED comprises over 146k verified quadruplets spanning three garment categories and seven edit types, including both appearance (e.g., color, pattern, material) and structural (e.g., sleeve length, neckline) modifications. Based on this benchmark, we further propose a unified multimodal diffusion framework that jointly reasons over linguistic instructions and visual garment cues, serving as a strong baseline for instruction-driven VTON and VTOFF. Dataset and code will be made publicly available. Project page: https://furio1999.github.io/Dress-ED/
title	Dress-ED: Instruction-Guided Editing for Virtual Try-On and Try-Off
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2603.22607

Similar Items