Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yu, Chuang, Zhao, Jinmiao, Liu, Yunpeng, Li, Yaokun, Shu, Xiujun, Feng, Yuanhao, Wang, Bo, Dai, Yimian, Yue, Xiangyu
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.05511
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915656421605376
author	Yu, Chuang Zhao, Jinmiao Liu, Yunpeng Li, Yaokun Shu, Xiujun Feng, Yuanhao Wang, Bo Dai, Yimian Yue, Xiangyu
author_facet	Yu, Chuang Zhao, Jinmiao Liu, Yunpeng Li, Yaokun Shu, Xiujun Feng, Yuanhao Wang, Bo Dai, Yimian Yue, Xiangyu
contents	While large-scale visual foundation models (VFMs) exhibit strong generalization across diverse visual domains, their potential for single-frame infrared small target (SIRST) detection remains largely unexplored. To fill this gap, we systematically introduce the frozen representations from VFMs into the SIRST task for the first time and propose a Foundation-Driven Efficient Paradigm (FDEP), which can seamlessly adapt to existing encoder-decoder-based methods and significantly improve accuracy without additional inference overhead. Specifically, a Semantic Alignment Modulation Fusion (SAMF) module is designed to achieve dynamic alignment and deep fusion of the global semantic priors from VFMs with task-specific features. Meanwhile, to avoid the inference time burden introduced by VFMs, we propose a Collaborative Optimization-based Implicit Self-Distillation (CO-ISD) strategy, which enables implicit semantic transfer between the main and lightweight branches through parameter sharing and synchronized backpropagation. In addition, to unify the fragmented evaluation system, we construct a Holistic SIRST Evaluation (HSE) metric that performs multi-threshold integral evaluation at both pixel-level confidence and target-level robustness, providing a stable and comprehensive basis for fair model comparison. Extensive experiments demonstrate that the SIRST detection networks equipped with our FDEP framework achieve state-of-the-art (SOTA) performance on multiple public datasets. Our code is available at https://github.com/YuChuang1205/FDEP-Framework
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_05511
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Rethinking Infrared Small Target Detection: A Foundation-Driven Efficient Paradigm Yu, Chuang Zhao, Jinmiao Liu, Yunpeng Li, Yaokun Shu, Xiujun Feng, Yuanhao Wang, Bo Dai, Yimian Yue, Xiangyu Computer Vision and Pattern Recognition While large-scale visual foundation models (VFMs) exhibit strong generalization across diverse visual domains, their potential for single-frame infrared small target (SIRST) detection remains largely unexplored. To fill this gap, we systematically introduce the frozen representations from VFMs into the SIRST task for the first time and propose a Foundation-Driven Efficient Paradigm (FDEP), which can seamlessly adapt to existing encoder-decoder-based methods and significantly improve accuracy without additional inference overhead. Specifically, a Semantic Alignment Modulation Fusion (SAMF) module is designed to achieve dynamic alignment and deep fusion of the global semantic priors from VFMs with task-specific features. Meanwhile, to avoid the inference time burden introduced by VFMs, we propose a Collaborative Optimization-based Implicit Self-Distillation (CO-ISD) strategy, which enables implicit semantic transfer between the main and lightweight branches through parameter sharing and synchronized backpropagation. In addition, to unify the fragmented evaluation system, we construct a Holistic SIRST Evaluation (HSE) metric that performs multi-threshold integral evaluation at both pixel-level confidence and target-level robustness, providing a stable and comprehensive basis for fair model comparison. Extensive experiments demonstrate that the SIRST detection networks equipped with our FDEP framework achieve state-of-the-art (SOTA) performance on multiple public datasets. Our code is available at https://github.com/YuChuang1205/FDEP-Framework
title	Rethinking Infrared Small Target Detection: A Foundation-Driven Efficient Paradigm
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2512.05511

Similar Items