Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zheng, Jintu, Liang, Yun, Zhang, Yuqing, Su, Wanchao
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Image and Video Processing
Online Access:	https://arxiv.org/abs/2409.14343
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916406011887616
author	Zheng, Jintu Liang, Yun Zhang, Yuqing Su, Wanchao
author_facet	Zheng, Jintu Liang, Yun Zhang, Yuqing Su, Wanchao
contents	Memory-based video object segmentation methods model multiple objects over long temporal-spatial spans by establishing memory bank, which achieve the remarkable performance. However, they struggle to overcome the false matching and are prone to lose critical information, resulting in confusion among different objects. In this paper, we propose an effective approach which jointly improving the matching and decoding stages to alleviate the false matching issue.For the memory matching stage, we present a cost aware mechanism that suppresses the slight errors for short-term memory and a shunted cross-scale matching for long-term memory which establish a wide filed matching spaces for various object scales. For the readout decoding stage, we implement a compensatory mechanism aims at recovering the essential information where missing at the matching stage. Our approach achieves the outstanding performance in several popular benchmarks (i.e., DAVIS 2016&2017 Val (92.4%&88.1%), and DAVIS 2017 Test (83.9%)), and achieves 84.8%&84.6% on YouTubeVOS 2018&2019 Val.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_14343
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Memory Matching is not Enough: Jointly Improving Memory Matching and Decoding for Video Object Segmentation Zheng, Jintu Liang, Yun Zhang, Yuqing Su, Wanchao Computer Vision and Pattern Recognition Image and Video Processing Memory-based video object segmentation methods model multiple objects over long temporal-spatial spans by establishing memory bank, which achieve the remarkable performance. However, they struggle to overcome the false matching and are prone to lose critical information, resulting in confusion among different objects. In this paper, we propose an effective approach which jointly improving the matching and decoding stages to alleviate the false matching issue.For the memory matching stage, we present a cost aware mechanism that suppresses the slight errors for short-term memory and a shunted cross-scale matching for long-term memory which establish a wide filed matching spaces for various object scales. For the readout decoding stage, we implement a compensatory mechanism aims at recovering the essential information where missing at the matching stage. Our approach achieves the outstanding performance in several popular benchmarks (i.e., DAVIS 2016&2017 Val (92.4%&88.1%), and DAVIS 2017 Test (83.9%)), and achieves 84.8%&84.6% on YouTubeVOS 2018&2019 Val.
title	Memory Matching is not Enough: Jointly Improving Memory Matching and Decoding for Video Object Segmentation
topic	Computer Vision and Pattern Recognition Image and Video Processing
url	https://arxiv.org/abs/2409.14343

Similar Items