Saved in:
Bibliographic Details
Main Authors: Yan, An, Cao, Leilei, Lu, Feng, Hong, Ran, Jiang, Youhai, Zhu, Fengjie
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.14901
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915501322534912
author Yan, An
Cao, Leilei
Lu, Feng
Hong, Ran
Jiang, Youhai
Zhu, Fengjie
author_facet Yan, An
Cao, Leilei
Lu, Feng
Hong, Ran
Jiang, Youhai
Zhu, Fengjie
contents Complex Video Object Segmentation (VOS) presents significant challenges in accurately segmenting objects across frames, especially in the presence of small and similar targets, frequent occlusions, rapid motion, and complex interactions. In this report, we present our solution for the LSVOS 2025 VOS Track based on the SAM2 framework. We adopt a pseudo-labeling strategy during training: a trained SAM2 checkpoint is deployed within the SAM2Long framework to generate pseudo labels for the MOSE test set, which are then combined with existing data for further training. For inference, the SAM2Long framework is employed to obtain our primary segmentation results, while an open-source SeC model runs in parallel to produce complementary predictions. A cascaded decision mechanism dynamically integrates outputs from both models, exploiting the temporal stability of SAM2Long and the concept-level robustness of SeC. Benefiting from pseudo-label training and cascaded multi-model inference, our approach achieves a J\&F score of 0.8616 on the MOSE test set -- +1.4 points over our SAM2Long baseline -- securing the 2nd place in the LSVOS 2025 VOS Track, and demonstrating strong robustness and accuracy in long, complex video segmentation scenarios.
format Preprint
id arxiv_https___arxiv_org_abs_2509_14901
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Pseudo-Label Enhanced Cascaded Framework: 2nd Technical Report for LSVOS 2025 VOS Track
Yan, An
Cao, Leilei
Lu, Feng
Hong, Ran
Jiang, Youhai
Zhu, Fengjie
Computer Vision and Pattern Recognition
Complex Video Object Segmentation (VOS) presents significant challenges in accurately segmenting objects across frames, especially in the presence of small and similar targets, frequent occlusions, rapid motion, and complex interactions. In this report, we present our solution for the LSVOS 2025 VOS Track based on the SAM2 framework. We adopt a pseudo-labeling strategy during training: a trained SAM2 checkpoint is deployed within the SAM2Long framework to generate pseudo labels for the MOSE test set, which are then combined with existing data for further training. For inference, the SAM2Long framework is employed to obtain our primary segmentation results, while an open-source SeC model runs in parallel to produce complementary predictions. A cascaded decision mechanism dynamically integrates outputs from both models, exploiting the temporal stability of SAM2Long and the concept-level robustness of SeC. Benefiting from pseudo-label training and cascaded multi-model inference, our approach achieves a J\&F score of 0.8616 on the MOSE test set -- +1.4 points over our SAM2Long baseline -- securing the 2nd place in the LSVOS 2025 VOS Track, and demonstrating strong robustness and accuracy in long, complex video segmentation scenarios.
title Pseudo-Label Enhanced Cascaded Framework: 2nd Technical Report for LSVOS 2025 VOS Track
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2509.14901