Enregistré dans:
Détails bibliographiques
Auteurs principaux: Zhang, Zhuorui, Pallarès-López, Roger, Namburi, Praneeth, Anthony, Brian W.
Format: Preprint
Publié: 2026
Sujets:
Accès en ligne:https://arxiv.org/abs/2603.06471
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866918391623712768
author Zhang, Zhuorui
Pallarès-López, Roger
Namburi, Praneeth
Anthony, Brian W.
author_facet Zhang, Zhuorui
Pallarès-López, Roger
Namburi, Praneeth
Anthony, Brian W.
contents Acquiring per-frame video annotations remains a primary bottleneck for deploying computer vision in specialized domains such as medical imaging, where expert labeling is slow and costly. Label propagation offers a natural solution, yet existing approaches face fundamental limitations. Video trackers and segmentation models can propagate labels within a single sequence but require per-video initialization and cannot generalize across videos. Classic correspondence pipelines operate on detector-chosen keypoints and struggle in low-texture scenes, while dense feature matching and one-shot segmentation methods enable cross-video propagation but lack spatiotemporal smoothness and unified support for both point and mask annotations. We present Match4Annotate, a lightweight framework for both intra-video and inter-video propagation of point and mask annotations. Our method fits a SIREN-based implicit neural representation to DINOv3 features at test time, producing a continuous, high-resolution spatiotemporal feature field, and learns a smooth implicit deformation field between frame pairs to guide correspondence matching. We evaluate on three challenging clinical ultrasound datasets. Match4Annotate achieves state-of-the-art inter-video propagation, outperforming feature matching and one-shot segmentation baselines, while remaining competitive with specialized trackers for intra-video propagation. Our results show that lightweight, test-time-optimized feature matching pipelines have the potential to offer an efficient and accessible solution for scalable annotation workflows.
format Preprint
id arxiv_https___arxiv_org_abs_2603_06471
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Match4Annotate: Propagating Sparse Video Annotations via Implicit Neural Feature Matching
Zhang, Zhuorui
Pallarès-López, Roger
Namburi, Praneeth
Anthony, Brian W.
Computer Vision and Pattern Recognition
Acquiring per-frame video annotations remains a primary bottleneck for deploying computer vision in specialized domains such as medical imaging, where expert labeling is slow and costly. Label propagation offers a natural solution, yet existing approaches face fundamental limitations. Video trackers and segmentation models can propagate labels within a single sequence but require per-video initialization and cannot generalize across videos. Classic correspondence pipelines operate on detector-chosen keypoints and struggle in low-texture scenes, while dense feature matching and one-shot segmentation methods enable cross-video propagation but lack spatiotemporal smoothness and unified support for both point and mask annotations. We present Match4Annotate, a lightweight framework for both intra-video and inter-video propagation of point and mask annotations. Our method fits a SIREN-based implicit neural representation to DINOv3 features at test time, producing a continuous, high-resolution spatiotemporal feature field, and learns a smooth implicit deformation field between frame pairs to guide correspondence matching. We evaluate on three challenging clinical ultrasound datasets. Match4Annotate achieves state-of-the-art inter-video propagation, outperforming feature matching and one-shot segmentation baselines, while remaining competitive with specialized trackers for intra-video propagation. Our results show that lightweight, test-time-optimized feature matching pipelines have the potential to offer an efficient and accessible solution for scalable annotation workflows.
title Match4Annotate: Propagating Sparse Video Annotations via Implicit Neural Feature Matching
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2603.06471