Saved in:
Bibliographic Details
Main Authors: Guo, Shuang, Febryanto, Filbert, Sun, Lei, Gallego, Guillermo
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.14528
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910053894717440
author Guo, Shuang
Febryanto, Filbert
Sun, Lei
Gallego, Guillermo
author_facet Guo, Shuang
Febryanto, Filbert
Sun, Lei
Gallego, Guillermo
contents In recent years, 3D visual foundation models pioneered by pointmap-based approaches such as DUSt3R have attracted a lot of interest, achieving impressive accuracy and strong generalization across diverse scenes. However, these methods are inherently limited to recovering scene geometry only at the discrete time instants when images are captured, leaving the scene evolution during the blind time between consecutive frames largely unexplored. We introduce Interp3R, to the best of our knowledge the first method that enhances pointmap-based models to estimate depth and camera poses at arbitrary time instants. Interp3R leverages asynchronous event data to interpolate pointmaps produced by frame-based models, enabling temporally continuous geometric representations. Depth and camera poses are then jointly recovered by aligning the interpolated pointmaps together with those predicted by the underlying frame-based models into a consistent spatial framework. We train Interp3R exclusively on a synthetic dataset, yet demonstrate strong generalization across a wide range of synthetic and real-world benchmarks. Extensive experiments show that Interp3R outperforms by a considerable margin state-of-the-art baselines that follow a two-stage pipeline of 2D video frame interpolation followed by 3D geometry estimation.
format Preprint
id arxiv_https___arxiv_org_abs_2603_14528
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Interp3R: Continuous-time 3D Geometry Estimation with Frames and Events
Guo, Shuang
Febryanto, Filbert
Sun, Lei
Gallego, Guillermo
Computer Vision and Pattern Recognition
Robotics
In recent years, 3D visual foundation models pioneered by pointmap-based approaches such as DUSt3R have attracted a lot of interest, achieving impressive accuracy and strong generalization across diverse scenes. However, these methods are inherently limited to recovering scene geometry only at the discrete time instants when images are captured, leaving the scene evolution during the blind time between consecutive frames largely unexplored. We introduce Interp3R, to the best of our knowledge the first method that enhances pointmap-based models to estimate depth and camera poses at arbitrary time instants. Interp3R leverages asynchronous event data to interpolate pointmaps produced by frame-based models, enabling temporally continuous geometric representations. Depth and camera poses are then jointly recovered by aligning the interpolated pointmaps together with those predicted by the underlying frame-based models into a consistent spatial framework. We train Interp3R exclusively on a synthetic dataset, yet demonstrate strong generalization across a wide range of synthetic and real-world benchmarks. Extensive experiments show that Interp3R outperforms by a considerable margin state-of-the-art baselines that follow a two-stage pipeline of 2D video frame interpolation followed by 3D geometry estimation.
title Interp3R: Continuous-time 3D Geometry Estimation with Frames and Events
topic Computer Vision and Pattern Recognition
Robotics
url https://arxiv.org/abs/2603.14528