Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ma, Zeliang, Yang, Song, Cui, Zhe, Zhao, Zhicheng, Su, Fei, Liu, Delong, Wang, Jingyu
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2404.12031
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913320236220416
author	Ma, Zeliang Yang, Song Cui, Zhe Zhao, Zhicheng Su, Fei Liu, Delong Wang, Jingyu
author_facet	Ma, Zeliang Yang, Song Cui, Zhe Zhao, Zhicheng Su, Fei Liu, Delong Wang, Jingyu
contents	The new trend in multi-object tracking task is to track objects of interest using natural language. However, the scarcity of paired prompt-instance data hinders its progress. To address this challenge, we propose a high-quality yet low-cost data generation method base on Unreal Engine 5 and construct a brand-new benchmark dataset, named Refer-UE-City, which primarily includes scenes from intersection surveillance videos, detailing the appearance and actions of people and vehicles. Specifically, it provides 14 videos with a total of 714 expressions, and is comparable in scale to the Refer-KITTI dataset. Additionally, we propose a multi-level semantic-guided multi-object framework called MLS-Track, where the interaction between the model and text is enhanced layer by layer through the introduction of Semantic Guidance Module (SGM) and Semantic Correlation Branch (SCB). Extensive experiments on Refer-UE-City and Refer-KITTI datasets demonstrate the effectiveness of our proposed framework and it achieves state-of-the-art performance. Code and datatsets will be available.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_12031
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	MLS-Track: Multilevel Semantic Interaction in RMOT Ma, Zeliang Yang, Song Cui, Zhe Zhao, Zhicheng Su, Fei Liu, Delong Wang, Jingyu Computer Vision and Pattern Recognition The new trend in multi-object tracking task is to track objects of interest using natural language. However, the scarcity of paired prompt-instance data hinders its progress. To address this challenge, we propose a high-quality yet low-cost data generation method base on Unreal Engine 5 and construct a brand-new benchmark dataset, named Refer-UE-City, which primarily includes scenes from intersection surveillance videos, detailing the appearance and actions of people and vehicles. Specifically, it provides 14 videos with a total of 714 expressions, and is comparable in scale to the Refer-KITTI dataset. Additionally, we propose a multi-level semantic-guided multi-object framework called MLS-Track, where the interaction between the model and text is enhanced layer by layer through the introduction of Semantic Guidance Module (SGM) and Semantic Correlation Branch (SCB). Extensive experiments on Refer-UE-City and Refer-KITTI datasets demonstrate the effectiveness of our proposed framework and it achieves state-of-the-art performance. Code and datatsets will be available.
title	MLS-Track: Multilevel Semantic Interaction in RMOT
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2404.12031

Similar Items