Saved in:
Bibliographic Details
Main Authors: Wang, Yihao, Miao, Yang, Zhao, Wenshuai, Yang, Wenyan, Wang, Zihan, Pajarinen, Joni, Van Gool, Luc, Paudel, Danda Pani, Kannala, Juho, Wang, Xi, Solin, Arno
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.25539
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908915730481152
author Wang, Yihao
Miao, Yang
Zhao, Wenshuai
Yang, Wenyan
Wang, Zihan
Pajarinen, Joni
Van Gool, Luc
Paudel, Danda Pani
Kannala, Juho
Wang, Xi
Solin, Arno
author_facet Wang, Yihao
Miao, Yang
Zhao, Wenshuai
Yang, Wenyan
Wang, Zihan
Pajarinen, Joni
Van Gool, Luc
Paudel, Danda Pani
Kannala, Juho
Wang, Xi
Solin, Arno
contents Articulation perception aims to recover the motion and structure of articulated objects (e.g., drawers and cupboards), and is fundamental to 3D scene understanding in robotics, simulation, and animation. Existing learning-based methods rely heavily on supervised training with high-quality 3D data and manual annotations, limiting scalability and diversity. To address this limitation, we propose PAWS, a method that directly extracts object articulations from hand-object interactions in large-scale in-the-wild egocentric videos. We evaluate our method on the public data sets, including HD-EPIC and Arti4D data sets, achieving significant improvements over baselines. We further demonstrate that the extracted articulations benefit downstream tasks, including fine-tuning 3D articulation prediction models and enabling robot manipulation. See the project website at https://aaltoml.github.io/PAWS/.
format Preprint
id arxiv_https___arxiv_org_abs_2603_25539
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle PAWS: Perception of Articulation in the Wild at Scale from Egocentric Videos
Wang, Yihao
Miao, Yang
Zhao, Wenshuai
Yang, Wenyan
Wang, Zihan
Pajarinen, Joni
Van Gool, Luc
Paudel, Danda Pani
Kannala, Juho
Wang, Xi
Solin, Arno
Computer Vision and Pattern Recognition
Articulation perception aims to recover the motion and structure of articulated objects (e.g., drawers and cupboards), and is fundamental to 3D scene understanding in robotics, simulation, and animation. Existing learning-based methods rely heavily on supervised training with high-quality 3D data and manual annotations, limiting scalability and diversity. To address this limitation, we propose PAWS, a method that directly extracts object articulations from hand-object interactions in large-scale in-the-wild egocentric videos. We evaluate our method on the public data sets, including HD-EPIC and Arti4D data sets, achieving significant improvements over baselines. We further demonstrate that the extracted articulations benefit downstream tasks, including fine-tuning 3D articulation prediction models and enabling robot manipulation. See the project website at https://aaltoml.github.io/PAWS/.
title PAWS: Perception of Articulation in the Wild at Scale from Egocentric Videos
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2603.25539