Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Miao, Xingyu, Zhao, Weiguang, Lu, Tao, Xu, Linning, Yu, Mulin, Long, Yang, Pang, Jiangmiao, Dong, Junting
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.04439
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910012003057664
author	Miao, Xingyu Zhao, Weiguang Lu, Tao Xu, Linning Yu, Mulin Long, Yang Pang, Jiangmiao Dong, Junting
author_facet	Miao, Xingyu Zhao, Weiguang Lu, Tao Xu, Linning Yu, Mulin Long, Yang Pang, Jiangmiao Dong, Junting
contents	Feed-forward multi-frame 3D reconstruction models often degrade on videos with object motion. Global-reference becomes ambiguous under multiple motions, while the local pointmap relies heavily on estimated relative poses and can drift, causing cross-frame misalignment and duplicated structures. We propose TrajVG, a reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. We couple sparse trajectories, per-frame local point maps, and relative camera poses with geometric consistency objectives: (i) bidirectional trajectory-pointmap consistency with controlled gradient flow, and (ii) a pose consistency objective driven by static track anchors that suppresses gradients from dynamic regions. To scale training to in-the-wild videos where 3D trajectory labels are scarce, we reformulate the same coupling constraints into self-supervised objectives using only pseudo 2D tracks, enabling unified training with mixed supervision. Extensive experiments across 3D tracking, pose estimation, pointmap reconstruction, and video depth show that TrajVG surpasses the current feedforward performance baseline.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_04439
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	TrajVG: 3D Trajectory-Coupled Visual Geometry Learning Miao, Xingyu Zhao, Weiguang Lu, Tao Xu, Linning Yu, Mulin Long, Yang Pang, Jiangmiao Dong, Junting Computer Vision and Pattern Recognition Feed-forward multi-frame 3D reconstruction models often degrade on videos with object motion. Global-reference becomes ambiguous under multiple motions, while the local pointmap relies heavily on estimated relative poses and can drift, causing cross-frame misalignment and duplicated structures. We propose TrajVG, a reconstruction framework that makes cross-frame 3D correspondence an explicit prediction by estimating camera-coordinate 3D trajectories. We couple sparse trajectories, per-frame local point maps, and relative camera poses with geometric consistency objectives: (i) bidirectional trajectory-pointmap consistency with controlled gradient flow, and (ii) a pose consistency objective driven by static track anchors that suppresses gradients from dynamic regions. To scale training to in-the-wild videos where 3D trajectory labels are scarce, we reformulate the same coupling constraints into self-supervised objectives using only pseudo 2D tracks, enabling unified training with mixed supervision. Extensive experiments across 3D tracking, pose estimation, pointmap reconstruction, and video depth show that TrajVG surpasses the current feedforward performance baseline.
title	TrajVG: 3D Trajectory-Coupled Visual Geometry Learning
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2602.04439

Similar Items