Saved in:
Bibliographic Details
Main Authors: Palanisamy, Senthil, Anand, Abhishek, Rathore, Satpal Singh, Patnaik, Pratyush, Khatana, Shubhanshu, Janweja, Ekaksh
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.05945
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Vision-language-action (VLA) models have driven demand for large-scale egocentric datasets, yet the hardware and infrastructure to collect long-horizon data remain inaccessible. Datasets today typically have episodes only a few minutes long, which fails to capture the long-horizon temporal dependencies that complex robotic task execution requires. We present MobileEgo Anywhere, a framework for collecting hour-plus egocentric trajectories on commodity mobile hardware that uses modern smartphone sensors for long-term pose tracking without the hardware barriers of traditional robotics data collection. We release three components: (1) STERA, an open-source video-processing pipeline that converts raw mobile captures into standardized, training-ready formats for VLA and foundation-model research; (2) a free mobile app that lets any user record egocentric activity; and (3) a 200-hour dataset of diverse, long-form egocentric data with persistent state tracking across 584 sessions. We further show this data is a usable training signal:mid-training a VLA on it lowers held-out action-prediction error.