Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Fangyi, Shen, Yaojie, Xu, Lu, Yuan, Ye, Zhang, Shu, Niu, Yulei, Wen, Longyin
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.19358
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911462463635456
author	Chen, Fangyi Shen, Yaojie Xu, Lu Yuan, Ye Zhang, Shu Niu, Yulei Wen, Longyin
author_facet	Chen, Fangyi Shen, Yaojie Xu, Lu Yuan, Ye Zhang, Shu Niu, Yulei Wen, Longyin
contents	Precise, object-aware control over visual content is essential for advanced image editing and compositional generation. Yet, most existing approaches operate on entire images holistically, limiting the ability to isolate and manipulate individual scene elements. In contrast, layered representations, where scenes are explicitly separated into objects, environmental context, and visual effects, provide a more intuitive and structured framework for interpreting and editing visual content. To bridge this gap and enable both compositional understanding and controllable editing, we introduce the Referring Layer Decomposition (RLD) task, which predicts complete RGBA layers from a single RGB image, conditioned on flexible user prompts, such as spatial inputs (e.g., points, boxes, masks), natural language descriptions, or combinations thereof. At the core is the RefLade, a large-scale dataset comprising 1.11M image-layer-prompt triplets produced by our scalable data engine, along with 100K manually curated, high-fidelity layers. Coupled with a perceptually grounded, human-preference-aligned automatic evaluation protocol, RefLade establishes RLD as a well-defined and benchmarkable research task. Building on this foundation, we present RefLayer, a simple baseline designed for prompt-conditioned layer decomposition, achieving high visual fidelity and semantic alignment. Extensive experiments show our approach enables effective training, reliable evaluation, and high-quality image decomposition, while exhibiting strong zero-shot generalization capabilities.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_19358
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Referring Layer Decomposition Chen, Fangyi Shen, Yaojie Xu, Lu Yuan, Ye Zhang, Shu Niu, Yulei Wen, Longyin Computer Vision and Pattern Recognition Precise, object-aware control over visual content is essential for advanced image editing and compositional generation. Yet, most existing approaches operate on entire images holistically, limiting the ability to isolate and manipulate individual scene elements. In contrast, layered representations, where scenes are explicitly separated into objects, environmental context, and visual effects, provide a more intuitive and structured framework for interpreting and editing visual content. To bridge this gap and enable both compositional understanding and controllable editing, we introduce the Referring Layer Decomposition (RLD) task, which predicts complete RGBA layers from a single RGB image, conditioned on flexible user prompts, such as spatial inputs (e.g., points, boxes, masks), natural language descriptions, or combinations thereof. At the core is the RefLade, a large-scale dataset comprising 1.11M image-layer-prompt triplets produced by our scalable data engine, along with 100K manually curated, high-fidelity layers. Coupled with a perceptually grounded, human-preference-aligned automatic evaluation protocol, RefLade establishes RLD as a well-defined and benchmarkable research task. Building on this foundation, we present RefLayer, a simple baseline designed for prompt-conditioned layer decomposition, achieving high visual fidelity and semantic alignment. Extensive experiments show our approach enables effective training, reliable evaluation, and high-quality image decomposition, while exhibiting strong zero-shot generalization capabilities.
title	Referring Layer Decomposition
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.19358

Similar Items