Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xie, Dingzhou, Lan, Rushi, Pang, Cheng, Ning, Enhao, Zeng, Jiahao, Zheng, Wei
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2510.14726
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908597921775616
author	Xie, Dingzhou Lan, Rushi Pang, Cheng Ning, Enhao Zeng, Jiahao Zheng, Wei
author_facet	Xie, Dingzhou Lan, Rushi Pang, Cheng Ning, Enhao Zeng, Jiahao Zheng, Wei
contents	Recent object detection methods have made remarkable progress by leveraging attention mechanisms to improve feature discriminability. However, most existing approaches are confined to refining single-layer or fusing dual-layer features, overlooking the rich inter-layer dependencies across multi-scale representations. This limits their ability to capture comprehensive contextual information essential for detecting objects with large scale variations. In this paper, we propose a novel Cross-Layer Feature Self-Attention Module (CFSAM), which holistically models both local and global dependencies within multi-scale feature maps. CFSAM consists of three key components: a convolutional local feature extractor, a Transformer-based global modeling unit that efficiently captures cross-layer interactions, and a feature fusion mechanism to restore and enhance the original representations. When integrated into the SSD300 framework, CFSAM significantly boosts detection performance, achieving 78.6% mAP on PASCAL VOC (vs. 75.5% baseline) and 52.1% mAP on COCO (vs. 43.1% baseline), outperforming existing attention modules. Moreover, the module accelerates convergence during training without introducing substantial computational overhead. Our work highlights the importance of explicit cross-layer attention modeling in advancing multi-scale object detection.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_14726
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Cross-Layer Feature Self-Attention Module for Multi-Scale Object Detection Xie, Dingzhou Lan, Rushi Pang, Cheng Ning, Enhao Zeng, Jiahao Zheng, Wei Computer Vision and Pattern Recognition Recent object detection methods have made remarkable progress by leveraging attention mechanisms to improve feature discriminability. However, most existing approaches are confined to refining single-layer or fusing dual-layer features, overlooking the rich inter-layer dependencies across multi-scale representations. This limits their ability to capture comprehensive contextual information essential for detecting objects with large scale variations. In this paper, we propose a novel Cross-Layer Feature Self-Attention Module (CFSAM), which holistically models both local and global dependencies within multi-scale feature maps. CFSAM consists of three key components: a convolutional local feature extractor, a Transformer-based global modeling unit that efficiently captures cross-layer interactions, and a feature fusion mechanism to restore and enhance the original representations. When integrated into the SSD300 framework, CFSAM significantly boosts detection performance, achieving 78.6% mAP on PASCAL VOC (vs. 75.5% baseline) and 52.1% mAP on COCO (vs. 43.1% baseline), outperforming existing attention modules. Moreover, the module accelerates convergence during training without introducing substantial computational overhead. Our work highlights the importance of explicit cross-layer attention modeling in advancing multi-scale object detection.
title	Cross-Layer Feature Self-Attention Module for Multi-Scale Object Detection
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2510.14726

Similar Items