Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	de Araujo, Luiz C. S., Figueiredo, Carlos M. S.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Machine Learning I.4.8
Online Access:	https://arxiv.org/abs/2502.04478
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915142025871360
author	de Araujo, Luiz C. S. Figueiredo, Carlos M. S.
author_facet	de Araujo, Luiz C. S. Figueiredo, Carlos M. S.
contents	Multi-Object Tracking (MOT) is a critical problem in computer vision, essential for understanding how objects move and interact in videos. This field faces significant challenges such as occlusions and complex environmental dynamics, impacting model accuracy and efficiency. While traditional approaches have relied on Convolutional Neural Networks (CNNs), introducing transformers has brought substantial advancements. This work introduces OneTrack-M, a transformer-based MOT model designed to enhance tracking computational efficiency and accuracy. Our approach simplifies the typical transformer-based architecture by eliminating the need for a decoder model for object detection and tracking. Instead, the encoder alone serves as the backbone for temporal data interpretation, significantly reducing processing time and increasing inference speed. Additionally, we employ innovative data pre-processing and multitask training techniques to address occlusion and diverse objective challenges within a single set of weights. Experimental results demonstrate that OneTrack-M achieves at least 25% faster inference times compared to state-of-the-art models in the literature while maintaining or improving tracking accuracy metrics. These improvements highlight the potential of the proposed solution for real-time applications such as autonomous vehicles, surveillance systems, and robotics, where rapid responses are crucial for system effectiveness.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_04478
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	OneTrack-M: A multitask approach to transformer-based MOT models de Araujo, Luiz C. S. Figueiredo, Carlos M. S. Computer Vision and Pattern Recognition Machine Learning I.4.8 Multi-Object Tracking (MOT) is a critical problem in computer vision, essential for understanding how objects move and interact in videos. This field faces significant challenges such as occlusions and complex environmental dynamics, impacting model accuracy and efficiency. While traditional approaches have relied on Convolutional Neural Networks (CNNs), introducing transformers has brought substantial advancements. This work introduces OneTrack-M, a transformer-based MOT model designed to enhance tracking computational efficiency and accuracy. Our approach simplifies the typical transformer-based architecture by eliminating the need for a decoder model for object detection and tracking. Instead, the encoder alone serves as the backbone for temporal data interpretation, significantly reducing processing time and increasing inference speed. Additionally, we employ innovative data pre-processing and multitask training techniques to address occlusion and diverse objective challenges within a single set of weights. Experimental results demonstrate that OneTrack-M achieves at least 25% faster inference times compared to state-of-the-art models in the literature while maintaining or improving tracking accuracy metrics. These improvements highlight the potential of the proposed solution for real-time applications such as autonomous vehicles, surveillance systems, and robotics, where rapid responses are crucial for system effectiveness.
title	OneTrack-M: A multitask approach to transformer-based MOT models
topic	Computer Vision and Pattern Recognition Machine Learning I.4.8
url	https://arxiv.org/abs/2502.04478

Similar Items