Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Tang, Yizhe, Sun, Zhimin, Du, Yuzhen, Yi, Ran, Lu, Guangben, Hu, Teng, Li, Luying, Ma, Lizhuang, Zou, Fangyuan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.01603
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915223342940160
author	Tang, Yizhe Sun, Zhimin Du, Yuzhen Yi, Ran Lu, Guangben Hu, Teng Li, Luying Ma, Lizhuang Zou, Fangyuan
author_facet	Tang, Yizhe Sun, Zhimin Du, Yuzhen Yi, Ran Lu, Guangben Hu, Teng Li, Luying Ma, Lizhuang Zou, Fangyuan
contents	Image inpainting aims to fill the missing region of an image. Recently, there has been a surge of interest in foreground-conditioned background inpainting, a sub-task that fills the background of an image while the foreground subject and associated text prompt are provided. Existing background inpainting methods typically strictly preserve the subject's original position from the source image, resulting in inconsistencies between the subject and the generated background. To address this challenge, we propose a new task, the "Text-Guided Subject-Position Variable Background Inpainting", which aims to dynamically adjust the subject position to achieve a harmonious relationship between the subject and the inpainted background, and propose the Adaptive Transformation Agent (A$^\text{T}$A) for this task. Firstly, we design a PosAgent Block that adaptively predicts an appropriate displacement based on given features to achieve variable subject-position. Secondly, we design the Reverse Displacement Transform (RDT) module, which arranges multiple PosAgent blocks in a reverse structure, to transform hierarchical feature maps from deep to shallow based on semantic information. Thirdly, we equip A$^\text{T}$A with a Position Switch Embedding to control whether the subject's position in the generated image is adaptively predicted or fixed. Extensive comparative experiments validate the effectiveness of our A$^\text{T}$A approach, which not only demonstrates superior inpainting capabilities in subject-position variable inpainting, but also ensures good performance on subject-position fixed inpainting.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_01603
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	A$^\text{T}$A: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting Tang, Yizhe Sun, Zhimin Du, Yuzhen Yi, Ran Lu, Guangben Hu, Teng Li, Luying Ma, Lizhuang Zou, Fangyuan Computer Vision and Pattern Recognition Image inpainting aims to fill the missing region of an image. Recently, there has been a surge of interest in foreground-conditioned background inpainting, a sub-task that fills the background of an image while the foreground subject and associated text prompt are provided. Existing background inpainting methods typically strictly preserve the subject's original position from the source image, resulting in inconsistencies between the subject and the generated background. To address this challenge, we propose a new task, the "Text-Guided Subject-Position Variable Background Inpainting", which aims to dynamically adjust the subject position to achieve a harmonious relationship between the subject and the inpainted background, and propose the Adaptive Transformation Agent (A$^\text{T}$A) for this task. Firstly, we design a PosAgent Block that adaptively predicts an appropriate displacement based on given features to achieve variable subject-position. Secondly, we design the Reverse Displacement Transform (RDT) module, which arranges multiple PosAgent blocks in a reverse structure, to transform hierarchical feature maps from deep to shallow based on semantic information. Thirdly, we equip A$^\text{T}$A with a Position Switch Embedding to control whether the subject's position in the generated image is adaptively predicted or fixed. Extensive comparative experiments validate the effectiveness of our A$^\text{T}$A approach, which not only demonstrates superior inpainting capabilities in subject-position variable inpainting, but also ensures good performance on subject-position fixed inpainting.
title	A$^\text{T}$A: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2504.01603

Similar Items