Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.14751 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866910023808974848 |
|---|---|
| author | Ke, Bingxin Zhou, Qunjie Huang, Jiahui Ren, Xuanchi Shen, Tianchang Schindler, Konrad Leal-Taixé, Laura Huang, Shengyu |
| author_facet | Ke, Bingxin Zhou, Qunjie Huang, Jiahui Ren, Xuanchi Shen, Tianchang Schindler, Konrad Leal-Taixé, Laura Huang, Shengyu |
| contents | We introduce CAPA, a parameter-efficient test-time optimization framework that adapts pre-trained 3D foundation models (FMs) for depth completion, using sparse geometric cues. Unlike prior methods that train task-specific encoders for auxiliary inputs, which often overfit and generalize poorly, CAPA freezes the FM backbone. Instead, it updates only a minimal set of parameters using Parameter-Efficient Fine-Tuning (e.g. LoRA or VPT), guided by gradients calculated directly from the sparse observations available at inference time. This approach effectively grounds the foundation model's geometric prior in the scene-specific measurements, correcting distortions and misplaced structures. For videos, CAPA introduces sequence-level parameter sharing, jointly adapting all frames to exploit temporal correlations, improve robustness, and enforce multi-frame consistency. CAPA is model-agnostic, compatible with any ViT-based FM, and achieves state-of-the-art results across diverse condition patterns on both indoor and outdoor datasets. Project page: research.nvidia.com/labs/dvl/projects/capa. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2602_14751 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Depth Completion as Parameter-Efficient Test-Time Adaptation Ke, Bingxin Zhou, Qunjie Huang, Jiahui Ren, Xuanchi Shen, Tianchang Schindler, Konrad Leal-Taixé, Laura Huang, Shengyu Computer Vision and Pattern Recognition We introduce CAPA, a parameter-efficient test-time optimization framework that adapts pre-trained 3D foundation models (FMs) for depth completion, using sparse geometric cues. Unlike prior methods that train task-specific encoders for auxiliary inputs, which often overfit and generalize poorly, CAPA freezes the FM backbone. Instead, it updates only a minimal set of parameters using Parameter-Efficient Fine-Tuning (e.g. LoRA or VPT), guided by gradients calculated directly from the sparse observations available at inference time. This approach effectively grounds the foundation model's geometric prior in the scene-specific measurements, correcting distortions and misplaced structures. For videos, CAPA introduces sequence-level parameter sharing, jointly adapting all frames to exploit temporal correlations, improve robustness, and enforce multi-frame consistency. CAPA is model-agnostic, compatible with any ViT-based FM, and achieves state-of-the-art results across diverse condition patterns on both indoor and outdoor datasets. Project page: research.nvidia.com/labs/dvl/projects/capa. |
| title | Depth Completion as Parameter-Efficient Test-Time Adaptation |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2602.14751 |