Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yun, Guhnoo, Yoo, Juhan, Kim, Kijung, Lee, Jeongho, Seo, Paul Hongsuck, Kim, Dong Hwan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.23947
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909034372661248
author	Yun, Guhnoo Yoo, Juhan Kim, Kijung Lee, Jeongho Seo, Paul Hongsuck Kim, Dong Hwan
author_facet	Yun, Guhnoo Yoo, Juhan Kim, Kijung Lee, Jeongho Seo, Paul Hongsuck Kim, Dong Hwan
contents	Recent studies have shown that 2D convolution and self-attention exhibit distinct spectral behaviors, and optimizing their spectral properties can enhance vision model performance. However, theoretical analyses remain limited in explaining why 2D convolution is more effective in high-pass filtering than self-attention and why larger kernels favor shape bias, akin to self-attention. In this paper, we employ graph spectral analysis to theoretically simulate and compare the frequency responses of 2D convolution and self-attention within a unified framework. Our results corroborate previous empirical findings and reveal that node connectivity, modulated by window size, is a key factor in shaping spectral functions. Leveraging this insight, we introduce a \textit{spectral-adaptive modulation} (SPAM) mixer, which processes visual features in a spectral-adaptive manner using multi-scale convolutional kernels and a spectral re-scaling mechanism to refine spectral components. Based on SPAM, we develop SPANetV2 as a novel vision backbone. Extensive experiments demonstrate that SPANetV2 outperforms state-of-the-art models across multiple vision tasks, including ImageNet-1K classification, COCO object detection, and ADE20K semantic segmentation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_23947
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Spectral-Adaptive Modulation Networks for Visual Perception Yun, Guhnoo Yoo, Juhan Kim, Kijung Lee, Jeongho Seo, Paul Hongsuck Kim, Dong Hwan Computer Vision and Pattern Recognition Recent studies have shown that 2D convolution and self-attention exhibit distinct spectral behaviors, and optimizing their spectral properties can enhance vision model performance. However, theoretical analyses remain limited in explaining why 2D convolution is more effective in high-pass filtering than self-attention and why larger kernels favor shape bias, akin to self-attention. In this paper, we employ graph spectral analysis to theoretically simulate and compare the frequency responses of 2D convolution and self-attention within a unified framework. Our results corroborate previous empirical findings and reveal that node connectivity, modulated by window size, is a key factor in shaping spectral functions. Leveraging this insight, we introduce a \textit{spectral-adaptive modulation} (SPAM) mixer, which processes visual features in a spectral-adaptive manner using multi-scale convolutional kernels and a spectral re-scaling mechanism to refine spectral components. Based on SPAM, we develop SPANetV2 as a novel vision backbone. Extensive experiments demonstrate that SPANetV2 outperforms state-of-the-art models across multiple vision tasks, including ImageNet-1K classification, COCO object detection, and ADE20K semantic segmentation.
title	Spectral-Adaptive Modulation Networks for Visual Perception
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2503.23947

Similar Items