Saved in:
| Main Authors: | Koppula, Skanda, Rocco, Ignacio, Yang, Yi, Heyward, Joe, Carreira, João, Zisserman, Andrew, Brostow, Gabriel, Doersch, Carl |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.05921 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
BootsTAP: Bootstrapped Training for Tracking-Any-Point
by: Doersch, Carl, et al.
Published: (2024)
by: Doersch, Carl, et al.
Published: (2024)
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
by: Zholus, Artem, et al.
Published: (2025)
by: Zholus, Artem, et al.
Published: (2025)
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
by: Heyward, Joseph, et al.
Published: (2024)
by: Heyward, Joseph, et al.
Published: (2024)
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
by: Papalampidi, Pinelopi, et al.
Published: (2023)
by: Papalampidi, Pinelopi, et al.
Published: (2023)
Learning from One Continuous Video Stream
by: Carreira, João, et al.
Published: (2023)
by: Carreira, João, et al.
Published: (2023)
TAPVid-360: Tracking Any Point in 360 from Narrow Field of View Video
by: Hudson, Finlay G. C., et al.
Published: (2025)
by: Hudson, Finlay G. C., et al.
Published: (2025)
TAPNext++: What's Next for Tracking Any Point (TAP)?
by: Jung, Sebastian, et al.
Published: (2026)
by: Jung, Sebastian, et al.
Published: (2026)
Efficiently Reconstructing Dynamic Scenes One D4RT at a Time
by: Zhang, Chuhan, et al.
Published: (2025)
by: Zhang, Chuhan, et al.
Published: (2025)
Perception Test 2025: Challenge Summary and a Unified VQA Extension
by: Heyward, Joseph, et al.
Published: (2026)
by: Heyward, Joseph, et al.
Published: (2026)
Scaling 4D Representations
by: Carreira, João, et al.
Published: (2024)
by: Carreira, João, et al.
Published: (2024)
Learning from Streaming Video with Orthogonal Gradients
by: Han, Tengda, et al.
Published: (2025)
by: Han, Tengda, et al.
Published: (2025)
Forecasting Motion in the Wild
by: Thakkar, Neerja, et al.
Published: (2026)
by: Thakkar, Neerja, et al.
Published: (2026)
A Mixed Diet Makes DINO An Omnivorous Vision Encoder
by: Kabra, Rishabh, et al.
Published: (2026)
by: Kabra, Rishabh, et al.
Published: (2026)
SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications
by: Hasson, Yana, et al.
Published: (2025)
by: Hasson, Yana, et al.
Published: (2025)
3D-Aware Instance Segmentation and Tracking in Egocentric Videos
by: Bhalgat, Yash, et al.
Published: (2024)
by: Bhalgat, Yash, et al.
Published: (2024)
Recurrent Video Masked Autoencoders
by: Zoran, Daniel, et al.
Published: (2025)
by: Zoran, Daniel, et al.
Published: (2025)
TAPIP3D: Tracking Any Point in Persistent 3D Geometry
by: Zhang, Bowei, et al.
Published: (2025)
by: Zhang, Bowei, et al.
Published: (2025)
TrackAny3D: Transferring Pretrained 3D Models for Category-unified 3D Point Cloud Tracking
by: Wang, Mengmeng, et al.
Published: (2025)
by: Wang, Mengmeng, et al.
Published: (2025)
GroundUp: Rapid Sketch-Based 3D City Massing
by: Unlu, Gizem Esra, et al.
Published: (2024)
by: Unlu, Gizem Esra, et al.
Published: (2024)
3D Spine Shape Estimation from Single 2D DXA
by: Bourigault, Emmanuelle, et al.
Published: (2024)
by: Bourigault, Emmanuelle, et al.
Published: (2024)
Tracking Any Point Methods for Markerless 3D Tissue Tracking in Endoscopic Stereo Images
by: Reuter, Konrad, et al.
Published: (2025)
by: Reuter, Konrad, et al.
Published: (2025)
Direct Motion Models for Assessing Generated Videos
by: Allen, Kelsey, et al.
Published: (2025)
by: Allen, Kelsey, et al.
Published: (2025)
Memory Consolidation Enables Long-Context Video Understanding
by: Balažević, Ivana, et al.
Published: (2024)
by: Balažević, Ivana, et al.
Published: (2024)
GMOS: Grounding Moving Object Segmentation in 3D Space and Time
by: Xie, Junyu, et al.
Published: (2026)
by: Xie, Junyu, et al.
Published: (2026)
A General Protocol to Probe Large Vision Models for 3D Physical Understanding
by: Zhan, Guanqi, et al.
Published: (2023)
by: Zhan, Guanqi, et al.
Published: (2023)
Recurrence-based Vanishing Point Detection
by: Bharadwaj, Skanda, et al.
Published: (2024)
by: Bharadwaj, Skanda, et al.
Published: (2024)
Online Segment Any 3D Thing as Instance Tracking
by: Wang, Hanshi, et al.
Published: (2025)
by: Wang, Hanshi, et al.
Published: (2025)
SpatialTracker: Tracking Any 2D Pixels in 3D Space
by: Xiao, Yuxi, et al.
Published: (2024)
by: Xiao, Yuxi, et al.
Published: (2024)
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
by: Abdelreheem, Ahmed, et al.
Published: (2025)
by: Abdelreheem, Ahmed, et al.
Published: (2025)
Self-Supervised Any-Point Tracking by Contrastive Random Walks
by: Shrivastava, Ayush, et al.
Published: (2024)
by: Shrivastava, Ayush, et al.
Published: (2024)
AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings
by: Watson, Jamie, et al.
Published: (2024)
by: Watson, Jamie, et al.
Published: (2024)
LabelAny3D: Label Any Object 3D in the Wild
by: Yao, Jin, et al.
Published: (2026)
by: Yao, Jin, et al.
Published: (2026)
Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking
by: Nguyen, Phuc, et al.
Published: (2024)
by: Nguyen, Phuc, et al.
Published: (2024)
Segment Any 3D Gaussians
by: Cen, Jiazhong, et al.
Published: (2023)
by: Cen, Jiazhong, et al.
Published: (2023)
Moving Off-the-Grid: Scene-Grounded Video Representations
by: van Steenkiste, Sjoerd, et al.
Published: (2024)
by: van Steenkiste, Sjoerd, et al.
Published: (2024)
SAI3D: Segment Any Instance in 3D Scenes
by: Yin, Yingda, et al.
Published: (2023)
by: Yin, Yingda, et al.
Published: (2023)
Register Any Point: Scaling 3D Point Cloud Registration by Flow Matching
by: Pan, Yue, et al.
Published: (2025)
by: Pan, Yue, et al.
Published: (2025)
SAMPart3D: Segment Any Part in 3D Objects
by: Yang, Yunhan, et al.
Published: (2024)
by: Yang, Yunhan, et al.
Published: (2024)
Multi-View 3D Point Tracking
by: Rajič, Frano, et al.
Published: (2025)
by: Rajič, Frano, et al.
Published: (2025)
BuildAnyPoint: 3D Building Structured Abstraction from Diverse Point Clouds
by: Hua, Tongyan, et al.
Published: (2026)
by: Hua, Tongyan, et al.
Published: (2026)
Similar Items
-
BootsTAP: Bootstrapped Training for Tracking-Any-Point
by: Doersch, Carl, et al.
Published: (2024) -
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
by: Zholus, Artem, et al.
Published: (2025) -
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
by: Heyward, Joseph, et al.
Published: (2024) -
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
by: Papalampidi, Pinelopi, et al.
Published: (2023) -
Learning from One Continuous Video Stream
by: Carreira, João, et al.
Published: (2023)