Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sun, Yiding, Zhu, Jihua, Cheng, Haozhe, Lu, Chaoyi, Yang, Zhichuan, Chen, Lin, Wang, Yaonan
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.23069
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911628846432256
author	Sun, Yiding Zhu, Jihua Cheng, Haozhe Lu, Chaoyi Yang, Zhichuan Chen, Lin Wang, Yaonan
author_facet	Sun, Yiding Zhu, Jihua Cheng, Haozhe Lu, Chaoyi Yang, Zhichuan Chen, Lin Wang, Yaonan
contents	Point cloud video understanding is critical for robotics as it accurately encodes motion and scene interaction. We recognize that 4D datasets are far scarcer than 3D ones, which hampers the scalability of self-supervised 4D models. A promising alternative is to transfer 3D pre-trained models to 4D perception tasks. However, rigorous empirical analysis reveals two critical limitations that impede transfer capability: overfitting and the modality gap. To overcome these challenges, we develop a novel "Align then Adapt" (PointATA) paradigm that decomposes parameter-efficient transfer learning into two sequential stages. Optimal-transport theory is employed to quantify the distributional discrepancy between 3D and 4D datasets, enabling our proposed point align embedder to be trained in Stage 1 to alleviate the underlying modality gap. To mitigate overfitting, an efficient point-video adapter and a spatial-context encoder are integrated into the frozen 3D backbone to enhance temporal modeling capacity in Stage 2. Notably, with the above engineering-oriented designs, PointATA enables a pre-trained 3D model without temporal knowledge to reason about dynamic video content at a smaller parameter cost compared to previous work. Extensive experiments show that PointATA can match or even outperform strong full fine-tuning models, whilst enjoying the advantage of parameter efficiency, e.g. 97.21 \% accuracy on 3D action recognition, $+8.7 \%$ on 4 D action segmentation, and 84.06\% on 4D semantic segmentation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_23069
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Align then Adapt: Rethinking Parameter-Efficient Transfer Learning in 4D Perception Sun, Yiding Zhu, Jihua Cheng, Haozhe Lu, Chaoyi Yang, Zhichuan Chen, Lin Wang, Yaonan Computer Vision and Pattern Recognition Point cloud video understanding is critical for robotics as it accurately encodes motion and scene interaction. We recognize that 4D datasets are far scarcer than 3D ones, which hampers the scalability of self-supervised 4D models. A promising alternative is to transfer 3D pre-trained models to 4D perception tasks. However, rigorous empirical analysis reveals two critical limitations that impede transfer capability: overfitting and the modality gap. To overcome these challenges, we develop a novel "Align then Adapt" (PointATA) paradigm that decomposes parameter-efficient transfer learning into two sequential stages. Optimal-transport theory is employed to quantify the distributional discrepancy between 3D and 4D datasets, enabling our proposed point align embedder to be trained in Stage 1 to alleviate the underlying modality gap. To mitigate overfitting, an efficient point-video adapter and a spatial-context encoder are integrated into the frozen 3D backbone to enhance temporal modeling capacity in Stage 2. Notably, with the above engineering-oriented designs, PointATA enables a pre-trained 3D model without temporal knowledge to reason about dynamic video content at a smaller parameter cost compared to previous work. Extensive experiments show that PointATA can match or even outperform strong full fine-tuning models, whilst enjoying the advantage of parameter efficiency, e.g. 97.21 \% accuracy on 3D action recognition, $+8.7 \%$ on 4 D action segmentation, and 84.06\% on 4D semantic segmentation.
title	Align then Adapt: Rethinking Parameter-Efficient Transfer Learning in 4D Perception
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.23069

Similar Items