Saved in:
Bibliographic Details
Main Authors: Tan, Xianfeng, Li, Yuhan, Shang, Wenxiang, Wu, Yubo, Wang, Jian, Chen, Xuanhong, Zhang, Yi, Lin, Ran, Ni, Bingbing
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2411.19528
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915542854533120
author Tan, Xianfeng
Li, Yuhan
Shang, Wenxiang
Wu, Yubo
Wang, Jian
Chen, Xuanhong
Zhang, Yi
Lin, Ran
Ni, Bingbing
author_facet Tan, Xianfeng
Li, Yuhan
Shang, Wenxiang
Wu, Yubo
Wang, Jian
Chen, Xuanhong
Zhang, Yi
Lin, Ran
Ni, Bingbing
contents Standard clothing asset generation involves restoring forward-facing flat-lay garment images displayed on a clear background by extracting clothing information from diverse real-world contexts, which presents significant challenges due to highly standardized structure sampling distributions and clothing semantic absence in complex scenarios. Existing models have limited spatial perception, often exhibiting structural hallucinations and texture distortion in this high-specification generative task. To address this issue, we propose a novel Retrieval-Augmented Generation (RAG) framework, termed RAGDiffusion, to enhance structure determinacy and mitigate hallucinations by assimilating knowledge from language models and external databases. RAGDiffusion consists of two processes: (1) Retrieval-based structure aggregation, which employs contrastive learning and a Structure Locally Linear Embedding (SLLE) to derive global structure and spatial landmarks, providing both soft and hard guidance to counteract structural ambiguities; and (2) Omni-level faithful garment generation, which introduces a coarse-to-fine texture alignment that ensures fidelity in pattern and detail components within the diffusing. Extensive experiments on challenging real-world datasets demonstrate that RAGDiffusion synthesizes structurally and texture-faithful clothing assets with significant performance improvements, representing a pioneering effort in high-specification faithful generation with RAG to confront intrinsic hallucinations and enhance fidelity.
format Preprint
id arxiv_https___arxiv_org_abs_2411_19528
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation
Tan, Xianfeng
Li, Yuhan
Shang, Wenxiang
Wu, Yubo
Wang, Jian
Chen, Xuanhong
Zhang, Yi
Lin, Ran
Ni, Bingbing
Computer Vision and Pattern Recognition
Artificial Intelligence
Graphics
Machine Learning
Standard clothing asset generation involves restoring forward-facing flat-lay garment images displayed on a clear background by extracting clothing information from diverse real-world contexts, which presents significant challenges due to highly standardized structure sampling distributions and clothing semantic absence in complex scenarios. Existing models have limited spatial perception, often exhibiting structural hallucinations and texture distortion in this high-specification generative task. To address this issue, we propose a novel Retrieval-Augmented Generation (RAG) framework, termed RAGDiffusion, to enhance structure determinacy and mitigate hallucinations by assimilating knowledge from language models and external databases. RAGDiffusion consists of two processes: (1) Retrieval-based structure aggregation, which employs contrastive learning and a Structure Locally Linear Embedding (SLLE) to derive global structure and spatial landmarks, providing both soft and hard guidance to counteract structural ambiguities; and (2) Omni-level faithful garment generation, which introduces a coarse-to-fine texture alignment that ensures fidelity in pattern and detail components within the diffusing. Extensive experiments on challenging real-world datasets demonstrate that RAGDiffusion synthesizes structurally and texture-faithful clothing assets with significant performance improvements, representing a pioneering effort in high-specification faithful generation with RAG to confront intrinsic hallucinations and enhance fidelity.
title RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation
topic Computer Vision and Pattern Recognition
Artificial Intelligence
Graphics
Machine Learning
url https://arxiv.org/abs/2411.19528