Saved in:
Bibliographic Details
Main Authors: Miao, Xingyu, Zhao, Weiguang, Lu, Tao, Xu, Linning, Yu, Mulin, Long, Yang, Pang, Jiangmiao, Dong, Junting
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.04439
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910012003057664
author Miao, Xingyu
Zhao, Weiguang
Lu, Tao
Xu, Linning
Yu, Mulin
Long, Yang
Pang, Jiangmiao
Dong, Junting
author_facet Miao, Xingyu
Zhao, Weiguang
Lu, Tao
Xu, Linning
Yu, Mulin
Long, Yang
Pang, Jiangmiao
Dong, Junting
contents Feed-forward multi-frame 3D reconstruction models often degrade on videos with object motion. Global-reference becomes ambiguous under multiple motions, while the local pointmap relies heavily on estimated relative poses and can drift, causing cross-frame misalignment and duplicated structures. We propose TrajVG, a reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. We couple sparse trajectories, per-frame local point maps, and relative camera poses with geometric consistency objectives: (i) bidirectional trajectory-pointmap consistency with controlled gradient flow, and (ii) a pose consistency objective driven by static track anchors that suppresses gradients from dynamic regions. To scale training to in-the-wild videos where 3D trajectory labels are scarce, we reformulate the same coupling constraints into self-supervised objectives using only pseudo 2D tracks, enabling unified training with mixed supervision. Extensive experiments across 3D tracking, pose estimation, pointmap reconstruction, and video depth show that TrajVG surpasses the current feedforward performance baseline.
format Preprint
id arxiv_https___arxiv_org_abs_2602_04439
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle TrajVG: 3D Trajectory-Coupled Visual Geometry Learning
Miao, Xingyu
Zhao, Weiguang
Lu, Tao
Xu, Linning
Yu, Mulin
Long, Yang
Pang, Jiangmiao
Dong, Junting
Computer Vision and Pattern Recognition
Feed-forward multi-frame 3D reconstruction models often degrade on videos with object motion. Global-reference becomes ambiguous under multiple motions, while the local pointmap relies heavily on estimated relative poses and can drift, causing cross-frame misalignment and duplicated structures. We propose TrajVG, a reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. We couple sparse trajectories, per-frame local point maps, and relative camera poses with geometric consistency objectives: (i) bidirectional trajectory-pointmap consistency with controlled gradient flow, and (ii) a pose consistency objective driven by static track anchors that suppresses gradients from dynamic regions. To scale training to in-the-wild videos where 3D trajectory labels are scarce, we reformulate the same coupling constraints into self-supervised objectives using only pseudo 2D tracks, enabling unified training with mixed supervision. Extensive experiments across 3D tracking, pose estimation, pointmap reconstruction, and video depth show that TrajVG surpasses the current feedforward performance baseline.
title TrajVG: 3D Trajectory-Coupled Visual Geometry Learning
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2602.04439