Saved in:
Bibliographic Details
Main Authors: Huang, Ching-Kai, Lin, Wen-Chieh, Lee, Yan-Cen
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.01298
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914299714207744
author Huang, Ching-Kai
Lin, Wen-Chieh
Lee, Yan-Cen
author_facet Huang, Ching-Kai
Lin, Wen-Chieh
Lee, Yan-Cen
contents Image-based object removal often erases only the named target, leaving behind interaction evidence that renders the result semantically inconsistent. We formalize this problem as Interaction-Consistent Object Removal (ICOR), which requires removing not only the target object but also associated interaction elements, such as lighting-dependent effects, physically connected objects, targetproduced elements, and contextually linked objects. To address this task, we propose Reasoning-Enhanced Object Removal with MLLM (REORM), a reasoningenhanced object removal framework that leverages multimodal large language models to infer which elements must be jointly removed. REORM features a modular design that integrates MLLM-driven analysis, mask-guided removal, and a self-correction mechanism, along with a local-deployment variant that supports accurate editing under limited resources. To support evaluation, we introduce ICOREval, a benchmark consisting of instruction-driven removals with rich interaction dependencies. On ICOREval, REORM outperforms state-of-the-art image editing systems, demonstrating its effectiveness in producing interactionconsistent results.
format Preprint
id arxiv_https___arxiv_org_abs_2602_01298
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Interaction-Consistent Object Removal via MLLM-Based Reasoning
Huang, Ching-Kai
Lin, Wen-Chieh
Lee, Yan-Cen
Computer Vision and Pattern Recognition
Image-based object removal often erases only the named target, leaving behind interaction evidence that renders the result semantically inconsistent. We formalize this problem as Interaction-Consistent Object Removal (ICOR), which requires removing not only the target object but also associated interaction elements, such as lighting-dependent effects, physically connected objects, targetproduced elements, and contextually linked objects. To address this task, we propose Reasoning-Enhanced Object Removal with MLLM (REORM), a reasoningenhanced object removal framework that leverages multimodal large language models to infer which elements must be jointly removed. REORM features a modular design that integrates MLLM-driven analysis, mask-guided removal, and a self-correction mechanism, along with a local-deployment variant that supports accurate editing under limited resources. To support evaluation, we introduce ICOREval, a benchmark consisting of instruction-driven removals with rich interaction dependencies. On ICOREval, REORM outperforms state-of-the-art image editing systems, demonstrating its effectiveness in producing interactionconsistent results.
title Interaction-Consistent Object Removal via MLLM-Based Reasoning
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2602.01298