Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sun, Zelong, Wu, Jiahui, Ba, Ying, Jing, Dong, Lu, Zhiwu
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.20511
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911404341067776
author	Sun, Zelong Wu, Jiahui Ba, Ying Jing, Dong Lu, Zhiwu
author_facet	Sun, Zelong Wu, Jiahui Ba, Ying Jing, Dong Lu, Zhiwu
contents	As social media platforms proliferate, users increasingly demand intuitive ways to create diverse, high-quality portrait collections. In this work, we introduce Portrait Collection Generation (PCG), a novel task that generates coherent portrait collections by editing a reference portrait image through natural language instructions. This task poses two unique challenges to existing methods: (1) complex multi-attribute modifications such as pose, spatial layout, and camera viewpoint; and (2) high-fidelity detail preservation including identity, clothing, and accessories. To address these challenges, we propose CHEESE, the first large-scale PCG dataset containing 24K portrait collections and 573K samples with high-quality modification text annotations, constructed through an Large Vison-Language Model-based pipeline with inversion-based verification. We further propose SCheese, a framework that combines text-guided generation with hierarchical identity and detail preservation. SCheese employs adaptive feature fusion mechanism to maintain identity consistency, and ConsistencyNet to inject fine-grained features for detail consistency. Comprehensive experiments validate the effectiveness of CHEESE in advancing PCG, with SCheese achieving state-of-the-art performance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_20511
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Say Cheese! Detail-Preserving Portrait Collection Generation via Natural Language Edits Sun, Zelong Wu, Jiahui Ba, Ying Jing, Dong Lu, Zhiwu Computer Vision and Pattern Recognition As social media platforms proliferate, users increasingly demand intuitive ways to create diverse, high-quality portrait collections. In this work, we introduce Portrait Collection Generation (PCG), a novel task that generates coherent portrait collections by editing a reference portrait image through natural language instructions. This task poses two unique challenges to existing methods: (1) complex multi-attribute modifications such as pose, spatial layout, and camera viewpoint; and (2) high-fidelity detail preservation including identity, clothing, and accessories. To address these challenges, we propose CHEESE, the first large-scale PCG dataset containing 24K portrait collections and 573K samples with high-quality modification text annotations, constructed through an Large Vison-Language Model-based pipeline with inversion-based verification. We further propose SCheese, a framework that combines text-guided generation with hierarchical identity and detail preservation. SCheese employs adaptive feature fusion mechanism to maintain identity consistency, and ConsistencyNet to inject fine-grained features for detail consistency. Comprehensive experiments validate the effectiveness of CHEESE in advancing PCG, with SCheese achieving state-of-the-art performance.
title	Say Cheese! Detail-Preserving Portrait Collection Generation via Natural Language Edits
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2601.20511

Similar Items