Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liang, Xiaojie, Chen, Zhimin, Sheng, Ziqi, Lu, Wei
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.12341
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908962117386240
author	Liang, Xiaojie Chen, Zhimin Sheng, Ziqi Lu, Wei
author_facet	Liang, Xiaojie Chen, Zhimin Sheng, Ziqi Lu, Wei
contents	As generative image editing advances, image manipulation localization (IML) must handle both traditional manipulations with conspicuous forensic artifacts and diffusion-generated edits that appear locally realistic. Existing methods typically rely on either low-level forensic cues or high-level semantics alone, leading to a fundamental micro--macro gap. To bridge this gap, we propose FASA, a unified framework for localizing both traditional and diffusion-generated manipulations. Specifically, we extract manipulation-sensitive frequency cues through an adaptive dual-band DCT module and learn manipulation-aware semantic priors via patch-level contrastive alignment on frozen CLIP representations. We then inject these priors into a hierarchical frequency pathway through a semantic-frequency side adapter for multi-scale feature interaction, and employ a prototype-guided, frequency-gated mask decoder to integrate semantic consistency with boundary-aware localization for tampered region prediction. Extensive experiments on OpenSDI and multiple traditional manipulation benchmarks demonstrate state-of-the-art localization performance, strong cross-generator and cross-dataset generalization, and robust performance under common image degradations.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_12341
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Bridging the Micro--Macro Gap: Frequency-Aware Semantic Alignment for Image Manipulation Localization Liang, Xiaojie Chen, Zhimin Sheng, Ziqi Lu, Wei Computer Vision and Pattern Recognition As generative image editing advances, image manipulation localization (IML) must handle both traditional manipulations with conspicuous forensic artifacts and diffusion-generated edits that appear locally realistic. Existing methods typically rely on either low-level forensic cues or high-level semantics alone, leading to a fundamental micro--macro gap. To bridge this gap, we propose FASA, a unified framework for localizing both traditional and diffusion-generated manipulations. Specifically, we extract manipulation-sensitive frequency cues through an adaptive dual-band DCT module and learn manipulation-aware semantic priors via patch-level contrastive alignment on frozen CLIP representations. We then inject these priors into a hierarchical frequency pathway through a semantic-frequency side adapter for multi-scale feature interaction, and employ a prototype-guided, frequency-gated mask decoder to integrate semantic consistency with boundary-aware localization for tampered region prediction. Extensive experiments on OpenSDI and multiple traditional manipulation benchmarks demonstrate state-of-the-art localization performance, strong cross-generator and cross-dataset generalization, and robust performance under common image degradations.
title	Bridging the Micro--Macro Gap: Frequency-Aware Semantic Alignment for Image Manipulation Localization
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2604.12341

Similar Items