Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Qiming, Bai, Yongqiang, Song, Hongxing
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2403.18193
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910441363472384
author	Wang, Qiming Bai, Yongqiang Song, Hongxing
author_facet	Wang, Qiming Bai, Yongqiang Song, Hongxing
contents	RGB-T tracking, a vital downstream task of object tracking, has made remarkable progress in recent years. Yet, it remains hindered by two major challenges: 1) the trade-off between performance and efficiency; 2) the scarcity of training data. To address the latter challenge, some recent methods employ prompts to fine-tune pre-trained RGB tracking models and leverage upstream knowledge in a parameter-efficient manner. However, these methods inadequately explore modality-independent patterns and disregard the dynamic reliability of different modalities in open scenarios. We propose M3PT, a novel RGB-T prompt tracking method that leverages middle fusion and multi-modal and multi-stage visual prompts to overcome these challenges. We pioneer the use of the adjustable middle fusion meta-framework for RGB-T tracking, which could help the tracker balance the performance with efficiency, to meet various demands of application. Furthermore, based on the meta-framework, we utilize multiple flexible prompt strategies to adapt the pre-trained model to comprehensive exploration of uni-modal patterns and improved modeling of fusion-modal features in diverse modality-priority scenarios, harnessing the potential of prompt learning in RGB-T tracking. Evaluating on 6 existing challenging benchmarks, our method surpasses previous state-of-the-art prompt fine-tuning methods while maintaining great competitiveness against excellent full-parameter fine-tuning methods, with only 0.34M fine-tuned parameters.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_18193
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Middle Fusion and Multi-Stage, Multi-Form Prompts for Robust RGB-T Tracking Wang, Qiming Bai, Yongqiang Song, Hongxing Computer Vision and Pattern Recognition RGB-T tracking, a vital downstream task of object tracking, has made remarkable progress in recent years. Yet, it remains hindered by two major challenges: 1) the trade-off between performance and efficiency; 2) the scarcity of training data. To address the latter challenge, some recent methods employ prompts to fine-tune pre-trained RGB tracking models and leverage upstream knowledge in a parameter-efficient manner. However, these methods inadequately explore modality-independent patterns and disregard the dynamic reliability of different modalities in open scenarios. We propose M3PT, a novel RGB-T prompt tracking method that leverages middle fusion and multi-modal and multi-stage visual prompts to overcome these challenges. We pioneer the use of the adjustable middle fusion meta-framework for RGB-T tracking, which could help the tracker balance the performance with efficiency, to meet various demands of application. Furthermore, based on the meta-framework, we utilize multiple flexible prompt strategies to adapt the pre-trained model to comprehensive exploration of uni-modal patterns and improved modeling of fusion-modal features in diverse modality-priority scenarios, harnessing the potential of prompt learning in RGB-T tracking. Evaluating on 6 existing challenging benchmarks, our method surpasses previous state-of-the-art prompt fine-tuning methods while maintaining great competitiveness against excellent full-parameter fine-tuning methods, with only 0.34M fine-tuned parameters.
title	Middle Fusion and Multi-Stage, Multi-Form Prompts for Robust RGB-T Tracking
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2403.18193

Similar Items