Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	He, Xunjie, Lee, Christina Dao Wen, Wang, Meiling, Yuan, Chengran, Huang, Zefan, Yue, Yufeng, Ang Jr, Marcelo H.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.07375
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909642932617216
author	He, Xunjie Lee, Christina Dao Wen Wang, Meiling Yuan, Chengran Huang, Zefan Yue, Yufeng Ang Jr, Marcelo H.
author_facet	He, Xunjie Lee, Christina Dao Wen Wang, Meiling Yuan, Chengran Huang, Zefan Yue, Yufeng Ang Jr, Marcelo H.
contents	Collaborative perception plays a crucial role in enhancing environmental understanding by expanding the perceptual range and improving robustness against sensor failures, which primarily involves collaborative 3D detection and tracking tasks. The former focuses on object recognition in individual frames, while the latter captures continuous instance tracklets over time. However, existing works in both areas predominantly focus on the vehicle superclass, lacking effective solutions for both multi-class collaborative detection and tracking. This limitation hinders their applicability in real-world scenarios, which involve diverse object classes with varying appearances and motion patterns. To overcome these limitations, we propose a multi-class collaborative detection and tracking framework tailored for diverse road users. We first present a detector with a global spatial attention fusion (GSAF) module, enhancing multi-scale feature learning for objects of varying sizes. Next, we introduce a tracklet RE-IDentification (REID) module that leverages visual semantics with a vision foundation model to effectively reduce ID SWitch (IDSW) errors, in cases of erroneous mismatches involving small objects like pedestrians. We further design a velocity-based adaptive tracklet management (VATM) module that adjusts the tracking interval dynamically based on object motion. Extensive experiments on the V2X-Real and OPV2V datasets show that our approach significantly outperforms existing state-of-the-art methods in both detection and tracking accuracy.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_07375
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	DINO-CoDT: Multi-class Collaborative Detection and Tracking with Vision Foundation Models He, Xunjie Lee, Christina Dao Wen Wang, Meiling Yuan, Chengran Huang, Zefan Yue, Yufeng Ang Jr, Marcelo H. Computer Vision and Pattern Recognition Collaborative perception plays a crucial role in enhancing environmental understanding by expanding the perceptual range and improving robustness against sensor failures, which primarily involves collaborative 3D detection and tracking tasks. The former focuses on object recognition in individual frames, while the latter captures continuous instance tracklets over time. However, existing works in both areas predominantly focus on the vehicle superclass, lacking effective solutions for both multi-class collaborative detection and tracking. This limitation hinders their applicability in real-world scenarios, which involve diverse object classes with varying appearances and motion patterns. To overcome these limitations, we propose a multi-class collaborative detection and tracking framework tailored for diverse road users. We first present a detector with a global spatial attention fusion (GSAF) module, enhancing multi-scale feature learning for objects of varying sizes. Next, we introduce a tracklet RE-IDentification (REID) module that leverages visual semantics with a vision foundation model to effectively reduce ID SWitch (IDSW) errors, in cases of erroneous mismatches involving small objects like pedestrians. We further design a velocity-based adaptive tracklet management (VATM) module that adjusts the tracking interval dynamically based on object motion. Extensive experiments on the V2X-Real and OPV2V datasets show that our approach significantly outperforms existing state-of-the-art methods in both detection and tracking accuracy.
title	DINO-CoDT: Multi-class Collaborative Detection and Tracking with Vision Foundation Models
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2506.07375

Similar Items