Saved in:
Bibliographic Details
Main Authors: Harley, Adam W., You, Yang, Sun, Xinglong, Zheng, Yang, Raghuraman, Nikhil, Gu, Yunqi, Liang, Sheldon, Chu, Wen-Hsuan, Dave, Achal, Tokmakov, Pavel, You, Suya, Ambrus, Rares, Fragkiadaki, Katerina, Guibas, Leonidas J.
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.07310
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916876966166528
author Harley, Adam W.
You, Yang
Sun, Xinglong
Zheng, Yang
Raghuraman, Nikhil
Gu, Yunqi
Liang, Sheldon
Chu, Wen-Hsuan
Dave, Achal
Tokmakov, Pavel
You, Suya
Ambrus, Rares
Fragkiadaki, Katerina
Guibas, Leonidas J.
author_facet Harley, Adam W.
You, Yang
Sun, Xinglong
Zheng, Yang
Raghuraman, Nikhil
Gu, Yunqi
Liang, Sheldon
Chu, Wen-Hsuan
Dave, Achal
Tokmakov, Pavel
You, Suya
Ambrus, Rares
Fragkiadaki, Katerina
Guibas, Leonidas J.
contents We introduce AllTracker: a model that estimates long-range point tracks by way of estimating the flow field between a query frame and every other frame of a video. Unlike existing point tracking methods, our approach delivers high-resolution and dense (all-pixel) correspondence fields, which can be visualized as flow maps. Unlike existing optical flow methods, our approach corresponds one frame to hundreds of subsequent frames, rather than just the next frame. We develop a new architecture for this task, blending techniques from existing work in optical flow and point tracking: the model performs iterative inference on low-resolution grids of correspondence estimates, propagating information spatially via 2D convolution layers, and propagating information temporally via pixel-aligned attention layers. The model is fast and parameter-efficient (16 million parameters), and delivers state-of-the-art point tracking accuracy at high resolution (i.e., tracking 768x1024 pixels, on a 40G GPU). A benefit of our design is that we can train jointly on optical flow datasets and point tracking datasets, and we find that doing so is crucial for top performance. We provide an extensive ablation study on our architecture details and training recipe, making it clear which details matter most. Our code and model weights are available at https://alltracker.github.io
format Preprint
id arxiv_https___arxiv_org_abs_2506_07310
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle AllTracker: Efficient Dense Point Tracking at High Resolution
Harley, Adam W.
You, Yang
Sun, Xinglong
Zheng, Yang
Raghuraman, Nikhil
Gu, Yunqi
Liang, Sheldon
Chu, Wen-Hsuan
Dave, Achal
Tokmakov, Pavel
You, Suya
Ambrus, Rares
Fragkiadaki, Katerina
Guibas, Leonidas J.
Computer Vision and Pattern Recognition
We introduce AllTracker: a model that estimates long-range point tracks by way of estimating the flow field between a query frame and every other frame of a video. Unlike existing point tracking methods, our approach delivers high-resolution and dense (all-pixel) correspondence fields, which can be visualized as flow maps. Unlike existing optical flow methods, our approach corresponds one frame to hundreds of subsequent frames, rather than just the next frame. We develop a new architecture for this task, blending techniques from existing work in optical flow and point tracking: the model performs iterative inference on low-resolution grids of correspondence estimates, propagating information spatially via 2D convolution layers, and propagating information temporally via pixel-aligned attention layers. The model is fast and parameter-efficient (16 million parameters), and delivers state-of-the-art point tracking accuracy at high resolution (i.e., tracking 768x1024 pixels, on a 40G GPU). A benefit of our design is that we can train jointly on optical flow datasets and point tracking datasets, and we find that doing so is crucial for top performance. We provide an extensive ablation study on our architecture details and training recipe, making it clear which details matter most. Our code and model weights are available at https://alltracker.github.io
title AllTracker: Efficient Dense Point Tracking at High Resolution
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2506.07310