Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Yunze, Chen, Changxi, Wang, Zifan, Yi, Li
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2401.09057
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911766696427520
author	Liu, Yunze Chen, Changxi Wang, Zifan Yi, Li
author_facet	Liu, Yunze Chen, Changxi Wang, Zifan Yi, Li
contents	This paper introduces a novel approach named CrossVideo, which aims to enhance self-supervised cross-modal contrastive learning in the field of point cloud video understanding. Traditional supervised learning methods encounter limitations due to data scarcity and challenges in label acquisition. To address these issues, we propose a self-supervised learning method that leverages the cross-modal relationship between point cloud videos and image videos to acquire meaningful feature representations. Intra-modal and cross-modal contrastive learning techniques are employed to facilitate effective comprehension of point cloud video. We also propose a multi-level contrastive approach for both modalities. Through extensive experiments, we demonstrate that our method significantly surpasses previous state-of-the-art approaches, and we conduct comprehensive ablation studies to validate the effectiveness of our proposed designs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_09057
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point Cloud Video Understanding Liu, Yunze Chen, Changxi Wang, Zifan Yi, Li Computer Vision and Pattern Recognition This paper introduces a novel approach named CrossVideo, which aims to enhance self-supervised cross-modal contrastive learning in the field of point cloud video understanding. Traditional supervised learning methods encounter limitations due to data scarcity and challenges in label acquisition. To address these issues, we propose a self-supervised learning method that leverages the cross-modal relationship between point cloud videos and image videos to acquire meaningful feature representations. Intra-modal and cross-modal contrastive learning techniques are employed to facilitate effective comprehension of point cloud video. We also propose a multi-level contrastive approach for both modalities. Through extensive experiments, we demonstrate that our method significantly surpasses previous state-of-the-art approaches, and we conduct comprehensive ablation studies to validate the effectiveness of our proposed designs.
title	CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point Cloud Video Understanding
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2401.09057

Similar Items