Saved in:
Bibliographic Details
Main Authors: Qi, Ruo, Dai, Linhui, Qin, Yusong, Yang, Chaolei, Li, Yanshan
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.12507
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912831725633536
author Qi, Ruo
Dai, Linhui
Qin, Yusong
Yang, Chaolei
Li, Yanshan
author_facet Qi, Ruo
Dai, Linhui
Qin, Yusong
Yang, Chaolei
Li, Yanshan
contents In remote sensing images, complex backgrounds, weak object signals, and small object scales make accurate detection particularly challenging, especially under low-quality imaging conditions. A common strategy is to integrate single-image super-resolution (SR) before detection; however, such serial pipelines often suffer from misaligned optimization objectives, feature redundancy, and a lack of effective interaction between SR and detection. To address these issues, we propose a Saliency-Driven multi-task Collaborative Network (SDCoNet) that couples SR and detection through implicit feature sharing while preserving task specificity. SDCoNet employs the swin transformer-based shared encoder, where hierarchical window-shifted self-attention supports cross-task feature collaboration and adaptively balances the trade-off between texture refinement and semantic representation. In addition, a multi-scale saliency prediction module produces importance scores to select key tokens, enabling focused attention on weak object regions, suppression of background clutter, and suppression of adverse features introduced by multi-task coupling. Furthermore, a gradient routing strategy is introduced to mitigate optimization conflicts. It first stabilizes detection semantics and subsequently routes SR gradients along a detection-oriented direction, enabling the framework to guide the SR branch to generate high-frequency details that are explicitly beneficial for detection. Experiments on public datasets, including NWPU VHR-10-Split, DOTAv1.5-Split, and HRSSD-Split, demonstrate that the proposed method, while maintaining competitive computational efficiency, significantly outperforms existing mainstream algorithms in small object detection on low-quality remote sensing images. Our code is available at https://github.com/qiruo-ya/SDCoNet.
format Preprint
id arxiv_https___arxiv_org_abs_2601_12507
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle SDCoNet: Saliency-Driven Multi-Task Collaborative Network for Remote Sensing Object Detection
Qi, Ruo
Dai, Linhui
Qin, Yusong
Yang, Chaolei
Li, Yanshan
Computer Vision and Pattern Recognition
Machine Learning
In remote sensing images, complex backgrounds, weak object signals, and small object scales make accurate detection particularly challenging, especially under low-quality imaging conditions. A common strategy is to integrate single-image super-resolution (SR) before detection; however, such serial pipelines often suffer from misaligned optimization objectives, feature redundancy, and a lack of effective interaction between SR and detection. To address these issues, we propose a Saliency-Driven multi-task Collaborative Network (SDCoNet) that couples SR and detection through implicit feature sharing while preserving task specificity. SDCoNet employs the swin transformer-based shared encoder, where hierarchical window-shifted self-attention supports cross-task feature collaboration and adaptively balances the trade-off between texture refinement and semantic representation. In addition, a multi-scale saliency prediction module produces importance scores to select key tokens, enabling focused attention on weak object regions, suppression of background clutter, and suppression of adverse features introduced by multi-task coupling. Furthermore, a gradient routing strategy is introduced to mitigate optimization conflicts. It first stabilizes detection semantics and subsequently routes SR gradients along a detection-oriented direction, enabling the framework to guide the SR branch to generate high-frequency details that are explicitly beneficial for detection. Experiments on public datasets, including NWPU VHR-10-Split, DOTAv1.5-Split, and HRSSD-Split, demonstrate that the proposed method, while maintaining competitive computational efficiency, significantly outperforms existing mainstream algorithms in small object detection on low-quality remote sensing images. Our code is available at https://github.com/qiruo-ya/SDCoNet.
title SDCoNet: Saliency-Driven Multi-Task Collaborative Network for Remote Sensing Object Detection
topic Computer Vision and Pattern Recognition
Machine Learning
url https://arxiv.org/abs/2601.12507