Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zeng, Yangchen, Yu, Zhenyu, Jiang, Dongming, Zhang, Wenbo, Hong, Yifan, Hu, Zhanhua, Luo, Jiao, Cui, Kangning
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.15065
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913038366408704
author	Zeng, Yangchen Yu, Zhenyu Jiang, Dongming Zhang, Wenbo Hong, Yifan Hu, Zhanhua Luo, Jiao Cui, Kangning
author_facet	Zeng, Yangchen Yu, Zhenyu Jiang, Dongming Zhang, Wenbo Hong, Yifan Hu, Zhanhua Luo, Jiao Cui, Kangning
contents	Transformer-based detectors have advanced small-object detection, but they often remain inefficient and vulnerable to background-induced query noise, which motivates deep decoders to refine low-quality queries. We present HELP (Heatmap-guided Embedding Learning Paradigm), a noise-aware positional-semantic fusion framework that studies where to embed positional information by selectively preserving positional encodings in foreground-salient regions while suppressing background clutter. Within HELP, we introduce Heatmap-guided Positional Embedding (HPE) as the core embedding mechanism and visualize it with a heatbar for interpretable diagnosis and fine-tuning. HPE is integrated into both the encoder and decoder: it guides noise-suppressed feature encoding by injecting heatmap-aware positional encoding, and it enables high-quality query retrieval by filtering background-dominant embeddings via a gradient-based mask filter before decoding. To address feature sparsity in complex small targets, we integrate Linear-Snake Convolution to enrich retrieval-relevant representations. The gradient-based heatmap supervision is used during training only, incurring no additional gradient computation at inference. As a result, our design reduces decoder layers from eight to three and achieves a 59.4% parameter reduction (66.3M vs. 163M) while maintaining consistent accuracy gains under a reduced compute budget across benchmarks. Code Repository: https://github.com/yidimopozhibai/Noise-Suppressed-Query-Retrieval
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_15065
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Learning Where to Embed: Noise-Aware Positional Embedding for Query Retrieval in Small-Object Detection Zeng, Yangchen Yu, Zhenyu Jiang, Dongming Zhang, Wenbo Hong, Yifan Hu, Zhanhua Luo, Jiao Cui, Kangning Computer Vision and Pattern Recognition Transformer-based detectors have advanced small-object detection, but they often remain inefficient and vulnerable to background-induced query noise, which motivates deep decoders to refine low-quality queries. We present HELP (Heatmap-guided Embedding Learning Paradigm), a noise-aware positional-semantic fusion framework that studies where to embed positional information by selectively preserving positional encodings in foreground-salient regions while suppressing background clutter. Within HELP, we introduce Heatmap-guided Positional Embedding (HPE) as the core embedding mechanism and visualize it with a heatbar for interpretable diagnosis and fine-tuning. HPE is integrated into both the encoder and decoder: it guides noise-suppressed feature encoding by injecting heatmap-aware positional encoding, and it enables high-quality query retrieval by filtering background-dominant embeddings via a gradient-based mask filter before decoding. To address feature sparsity in complex small targets, we integrate Linear-Snake Convolution to enrich retrieval-relevant representations. The gradient-based heatmap supervision is used during training only, incurring no additional gradient computation at inference. As a result, our design reduces decoder layers from eight to three and achieves a 59.4% parameter reduction (66.3M vs. 163M) while maintaining consistent accuracy gains under a reduced compute budget across benchmarks. Code Repository: https://github.com/yidimopozhibai/Noise-Suppressed-Query-Retrieval
title	Learning Where to Embed: Noise-Aware Positional Embedding for Query Retrieval in Small-Object Detection
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2604.15065

Similar Items