MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Kang, Ben, Zhao, Jie, Chen, Xin, Geng, Wanting, Zhang, Bin, Zhang, Lu, Wang, Dong, Lu, Huchuan
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Computer Vision and Pattern Recognition
Accesso online:	https://arxiv.org/abs/2603.01412
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866911481009799168
author	Kang, Ben Zhao, Jie Chen, Xin Geng, Wanting Zhang, Bin Zhang, Lu Wang, Dong Lu, Huchuan
author_facet	Kang, Ben Zhao, Jie Chen, Xin Geng, Wanting Zhang, Bin Zhang, Lu Wang, Dong Lu, Huchuan
contents	With growing real-world demands, efficient tracking has received increasing attention. However, most existing methods are limited to RGB inputs and struggle in multi-modal scenarios. Moreover, current multi-modal tracking approaches typically use complex designs, making them too heavy and slow for resource-constrained deployment. To tackle these limitations, we propose UETrack, an efficient framework for single object tracking. UETrack demonstrates high practicality and versatility, efficiently handling multiple modalities including RGB, Depth, Thermal, Event, and Language, and addresses the gap in efficient multi-modal tracking. It introduces two key components: a Token-Pooling-based Mixture-of-Experts mechanism that enhances modeling capacity through feature aggregation and expert specialization, and a Target-aware Adaptive Distillation strategy that selectively performs distillation based on sample characteristics, reducing redundant supervision and improving performance. Extensive experiments on 12 benchmarks across 3 hardware platforms show that UETrack achieves a superior speed-accuracy trade-off compared to previous methods. For instance, UETrack-B achieves 69.2% AUC on LaSOT and runs at 163/56/60 FPS on GPU/CPU/AGX, demonstrating strong practicality and versatility. Code is available at https://github.com/kangben258/UETrack.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_01412
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	UETrack: A Unified and Efficient Framework for Single Object Tracking Kang, Ben Zhao, Jie Chen, Xin Geng, Wanting Zhang, Bin Zhang, Lu Wang, Dong Lu, Huchuan Computer Vision and Pattern Recognition With growing real-world demands, efficient tracking has received increasing attention. However, most existing methods are limited to RGB inputs and struggle in multi-modal scenarios. Moreover, current multi-modal tracking approaches typically use complex designs, making them too heavy and slow for resource-constrained deployment. To tackle these limitations, we propose UETrack, an efficient framework for single object tracking. UETrack demonstrates high practicality and versatility, efficiently handling multiple modalities including RGB, Depth, Thermal, Event, and Language, and addresses the gap in efficient multi-modal tracking. It introduces two key components: a Token-Pooling-based Mixture-of-Experts mechanism that enhances modeling capacity through feature aggregation and expert specialization, and a Target-aware Adaptive Distillation strategy that selectively performs distillation based on sample characteristics, reducing redundant supervision and improving performance. Extensive experiments on 12 benchmarks across 3 hardware platforms show that UETrack achieves a superior speed-accuracy trade-off compared to previous methods. For instance, UETrack-B achieves 69.2% AUC on LaSOT and runs at 163/56/60 FPS on GPU/CPU/AGX, demonstrating strong practicality and versatility. Code is available at https://github.com/kangben258/UETrack.
title	UETrack: A Unified and Efficient Framework for Single Object Tracking
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2603.01412

Documenti analoghi