Saved in:
Bibliographic Details
Main Authors: Ke, Bingxin, Zhou, Qunjie, Huang, Jiahui, Ren, Xuanchi, Shen, Tianchang, Schindler, Konrad, Leal-Taixé, Laura, Huang, Shengyu
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.14751
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910023808974848
author Ke, Bingxin
Zhou, Qunjie
Huang, Jiahui
Ren, Xuanchi
Shen, Tianchang
Schindler, Konrad
Leal-Taixé, Laura
Huang, Shengyu
author_facet Ke, Bingxin
Zhou, Qunjie
Huang, Jiahui
Ren, Xuanchi
Shen, Tianchang
Schindler, Konrad
Leal-Taixé, Laura
Huang, Shengyu
contents We introduce CAPA, a parameter-efficient test-time optimization framework that adapts pre-trained 3D foundation models (FMs) for depth completion, using sparse geometric cues. Unlike prior methods that train task-specific encoders for auxiliary inputs, which often overfit and generalize poorly, CAPA freezes the FM backbone. Instead, it updates only a minimal set of parameters using Parameter-Efficient Fine-Tuning (e.g. LoRA or VPT), guided by gradients calculated directly from the sparse observations available at inference time. This approach effectively grounds the foundation model's geometric prior in the scene-specific measurements, correcting distortions and misplaced structures. For videos, CAPA introduces sequence-level parameter sharing, jointly adapting all frames to exploit temporal correlations, improve robustness, and enforce multi-frame consistency. CAPA is model-agnostic, compatible with any ViT-based FM, and achieves state-of-the-art results across diverse condition patterns on both indoor and outdoor datasets. Project page: research.nvidia.com/labs/dvl/projects/capa.
format Preprint
id arxiv_https___arxiv_org_abs_2602_14751
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Depth Completion as Parameter-Efficient Test-Time Adaptation
Ke, Bingxin
Zhou, Qunjie
Huang, Jiahui
Ren, Xuanchi
Shen, Tianchang
Schindler, Konrad
Leal-Taixé, Laura
Huang, Shengyu
Computer Vision and Pattern Recognition
We introduce CAPA, a parameter-efficient test-time optimization framework that adapts pre-trained 3D foundation models (FMs) for depth completion, using sparse geometric cues. Unlike prior methods that train task-specific encoders for auxiliary inputs, which often overfit and generalize poorly, CAPA freezes the FM backbone. Instead, it updates only a minimal set of parameters using Parameter-Efficient Fine-Tuning (e.g. LoRA or VPT), guided by gradients calculated directly from the sparse observations available at inference time. This approach effectively grounds the foundation model's geometric prior in the scene-specific measurements, correcting distortions and misplaced structures. For videos, CAPA introduces sequence-level parameter sharing, jointly adapting all frames to exploit temporal correlations, improve robustness, and enforce multi-frame consistency. CAPA is model-agnostic, compatible with any ViT-based FM, and achieves state-of-the-art results across diverse condition patterns on both indoor and outdoor datasets. Project page: research.nvidia.com/labs/dvl/projects/capa.
title Depth Completion as Parameter-Efficient Test-Time Adaptation
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2602.14751