Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Huang, Ching-Kai, Lin, Wen-Chieh, Lee, Yan-Cen
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.01298
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914299714207744
author	Huang, Ching-Kai Lin, Wen-Chieh Lee, Yan-Cen
author_facet	Huang, Ching-Kai Lin, Wen-Chieh Lee, Yan-Cen
contents	Image-based object removal often erases only the named target, leaving behind interaction evidence that renders the result semantically inconsistent. We formalize this problem as Interaction-Consistent Object Removal (ICOR), which requires removing not only the target object but also associated interaction elements, such as lighting-dependent effects, physically connected objects, targetproduced elements, and contextually linked objects. To address this task, we propose Reasoning-Enhanced Object Removal with MLLM (REORM), a reasoningenhanced object removal framework that leverages multimodal large language models to infer which elements must be jointly removed. REORM features a modular design that integrates MLLM-driven analysis, mask-guided removal, and a self-correction mechanism, along with a local-deployment variant that supports accurate editing under limited resources. To support evaluation, we introduce ICOREval, a benchmark consisting of instruction-driven removals with rich interaction dependencies. On ICOREval, REORM outperforms state-of-the-art image editing systems, demonstrating its effectiveness in producing interactionconsistent results.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_01298
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Interaction-Consistent Object Removal via MLLM-Based Reasoning Huang, Ching-Kai Lin, Wen-Chieh Lee, Yan-Cen Computer Vision and Pattern Recognition Image-based object removal often erases only the named target, leaving behind interaction evidence that renders the result semantically inconsistent. We formalize this problem as Interaction-Consistent Object Removal (ICOR), which requires removing not only the target object but also associated interaction elements, such as lighting-dependent effects, physically connected objects, targetproduced elements, and contextually linked objects. To address this task, we propose Reasoning-Enhanced Object Removal with MLLM (REORM), a reasoningenhanced object removal framework that leverages multimodal large language models to infer which elements must be jointly removed. REORM features a modular design that integrates MLLM-driven analysis, mask-guided removal, and a self-correction mechanism, along with a local-deployment variant that supports accurate editing under limited resources. To support evaluation, we introduce ICOREval, a benchmark consisting of instruction-driven removals with rich interaction dependencies. On ICOREval, REORM outperforms state-of-the-art image editing systems, demonstrating its effectiveness in producing interactionconsistent results.
title	Interaction-Consistent Object Removal via MLLM-Based Reasoning
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.01298

Similar Items