Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wu, Yuchen, Li, Jiahe, Yu, Xiaohan, Yu, Lina, Zheng, Jin, Bai, Xiao
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.09665
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915730050514944
author	Wu, Yuchen Li, Jiahe Yu, Xiaohan Yu, Lina Zheng, Jin Bai, Xiao
author_facet	Wu, Yuchen Li, Jiahe Yu, Xiaohan Yu, Lina Zheng, Jin Bai, Xiao
contents	Monocular visual SLAM enables 3D reconstruction from internet video and autonomous navigation on resource-constrained platforms, yet suffers from scale drift, i.e., the gradual divergence of estimated scale over long sequences. Existing frame-to-frame methods achieve real-time performance through local optimization but accumulate scale drift due to the lack of global constraints among independent windows. To address this, we propose SCE-SLAM, an end-to-end SLAM system that maintains scale consistency through scene coordinate embeddings, which are learned patch-level representations encoding 3D geometric relationships under a canonical scale reference. The framework consists of two key modules: geometry-guided aggregation that leverages 3D spatial proximity to propagate scale information from historical observations through geometry-modulated attention, and scene coordinate bundle adjustment that anchors current estimates to the reference scale through explicit 3D coordinate constraints decoded from the scene coordinate embeddings. Experiments on KITTI, Waymo, and vKITTI demonstrate substantial improvements: our method reduces absolute trajectory error by 8.36m on KITTI compared to the best prior approach, while maintaining 36 FPS and achieving scale consistency across large-scale scenes.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_09665
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	SCE-SLAM: Scale-Consistent Monocular SLAM via Scene Coordinate Embeddings Wu, Yuchen Li, Jiahe Yu, Xiaohan Yu, Lina Zheng, Jin Bai, Xiao Computer Vision and Pattern Recognition Monocular visual SLAM enables 3D reconstruction from internet video and autonomous navigation on resource-constrained platforms, yet suffers from scale drift, i.e., the gradual divergence of estimated scale over long sequences. Existing frame-to-frame methods achieve real-time performance through local optimization but accumulate scale drift due to the lack of global constraints among independent windows. To address this, we propose SCE-SLAM, an end-to-end SLAM system that maintains scale consistency through scene coordinate embeddings, which are learned patch-level representations encoding 3D geometric relationships under a canonical scale reference. The framework consists of two key modules: geometry-guided aggregation that leverages 3D spatial proximity to propagate scale information from historical observations through geometry-modulated attention, and scene coordinate bundle adjustment that anchors current estimates to the reference scale through explicit 3D coordinate constraints decoded from the scene coordinate embeddings. Experiments on KITTI, Waymo, and vKITTI demonstrate substantial improvements: our method reduces absolute trajectory error by 8.36m on KITTI compared to the best prior approach, while maintaining 36 FPS and achieving scale consistency across large-scale scenes.
title	SCE-SLAM: Scale-Consistent Monocular SLAM via Scene Coordinate Embeddings
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2601.09665

Similar Items