Saved in:
Bibliographic Details
Main Authors: Liang, Xiaojie, Chen, Zhimin, Sheng, Ziqi, Lu, Wei
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.12341
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908962117386240
author Liang, Xiaojie
Chen, Zhimin
Sheng, Ziqi
Lu, Wei
author_facet Liang, Xiaojie
Chen, Zhimin
Sheng, Ziqi
Lu, Wei
contents As generative image editing advances, image manipulation localization (IML) must handle both traditional manipulations with conspicuous forensic artifacts and diffusion-generated edits that appear locally realistic. Existing methods typically rely on either low-level forensic cues or high-level semantics alone, leading to a fundamental micro--macro gap. To bridge this gap, we propose FASA, a unified framework for localizing both traditional and diffusion-generated manipulations. Specifically, we extract manipulation-sensitive frequency cues through an adaptive dual-band DCT module and learn manipulation-aware semantic priors via patch-level contrastive alignment on frozen CLIP representations. We then inject these priors into a hierarchical frequency pathway through a semantic-frequency side adapter for multi-scale feature interaction, and employ a prototype-guided, frequency-gated mask decoder to integrate semantic consistency with boundary-aware localization for tampered region prediction. Extensive experiments on OpenSDI and multiple traditional manipulation benchmarks demonstrate state-of-the-art localization performance, strong cross-generator and cross-dataset generalization, and robust performance under common image degradations.
format Preprint
id arxiv_https___arxiv_org_abs_2604_12341
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Bridging the Micro--Macro Gap: Frequency-Aware Semantic Alignment for Image Manipulation Localization
Liang, Xiaojie
Chen, Zhimin
Sheng, Ziqi
Lu, Wei
Computer Vision and Pattern Recognition
As generative image editing advances, image manipulation localization (IML) must handle both traditional manipulations with conspicuous forensic artifacts and diffusion-generated edits that appear locally realistic. Existing methods typically rely on either low-level forensic cues or high-level semantics alone, leading to a fundamental micro--macro gap. To bridge this gap, we propose FASA, a unified framework for localizing both traditional and diffusion-generated manipulations. Specifically, we extract manipulation-sensitive frequency cues through an adaptive dual-band DCT module and learn manipulation-aware semantic priors via patch-level contrastive alignment on frozen CLIP representations. We then inject these priors into a hierarchical frequency pathway through a semantic-frequency side adapter for multi-scale feature interaction, and employ a prototype-guided, frequency-gated mask decoder to integrate semantic consistency with boundary-aware localization for tampered region prediction. Extensive experiments on OpenSDI and multiple traditional manipulation benchmarks demonstrate state-of-the-art localization performance, strong cross-generator and cross-dataset generalization, and robust performance under common image degradations.
title Bridging the Micro--Macro Gap: Frequency-Aware Semantic Alignment for Image Manipulation Localization
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2604.12341