Saved in:
| Main Authors: | Peirone, Simone Alberto, Pistilli, Francesca, Alliegro, Antonio, Tommasi, Tatiana, Averta, Giuseppe |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.24690 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos
by: Peirone, Simone Alberto, et al.
Published: (2025)
by: Peirone, Simone Alberto, et al.
Published: (2025)
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives
by: Peirone, Simone Alberto, et al.
Published: (2025)
by: Peirone, Simone Alberto, et al.
Published: (2025)
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives
by: Peirone, Simone Alberto, et al.
Published: (2024)
by: Peirone, Simone Alberto, et al.
Published: (2024)
FORESCENE: FOREcasting human activity via latent SCENE graphs diffusion
by: Alliegro, Antonio, et al.
Published: (2025)
by: Alliegro, Antonio, et al.
Published: (2025)
HiERO-StepG @ Ego4D Step Grounding Challenge: hierarchical activity understanding enables zero-shot step grounding
by: Zenotto, Andrea, et al.
Published: (2026)
by: Zenotto, Andrea, et al.
Published: (2026)
Egocentric zone-aware action recognition across environments
by: Peirone, Simone Alberto, et al.
Published: (2024)
by: Peirone, Simone Alberto, et al.
Published: (2024)
Domain Generalization using Action Sequences for Egocentric Action Recognition
by: Nasirimajd, Amirshayan, et al.
Published: (2025)
by: Nasirimajd, Amirshayan, et al.
Published: (2025)
PEM: Prototype-based Efficient MaskFormer for Image Segmentation
by: Cavagnero, Niccolò, et al.
Published: (2024)
by: Cavagnero, Niccolò, et al.
Published: (2024)
Transient Fault Tolerant Semantic Segmentation for Autonomous Driving
by: Iurada, Leonardo, et al.
Published: (2024)
by: Iurada, Leonardo, et al.
Published: (2024)
AMEGO: Active Memory from long EGOcentric videos
by: Goletto, Gabriele, et al.
Published: (2024)
by: Goletto, Gabriele, et al.
Published: (2024)
Cross-Domain Transfer Learning with CoRTe: Consistent and Reliable Transfer from Black-Box to Lightweight Segmentation Model
by: Cuttano, Claudia, et al.
Published: (2024)
by: Cuttano, Claudia, et al.
Published: (2024)
A Modern Take on Visual Relationship Reasoning for Grasp Planning
by: Rabino, Paolo, et al.
Published: (2024)
by: Rabino, Paolo, et al.
Published: (2024)
MaskPlanner: Learning-Based Object-Centric Motion Generation from 3D Point Clouds
by: Tiboni, Gabriele, et al.
Published: (2025)
by: Tiboni, Gabriele, et al.
Published: (2025)
The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences
by: Long, Bria, et al.
Published: (2024)
by: Long, Bria, et al.
Published: (2024)
Efficient Odd-One-Out Anomaly Detection
by: Chito, Silvio, et al.
Published: (2025)
by: Chito, Silvio, et al.
Published: (2025)
SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation
by: Cuttano, Claudia, et al.
Published: (2025)
by: Cuttano, Claudia, et al.
Published: (2025)
What does CLIP know about peeling a banana?
by: Cuttano, Claudia, et al.
Published: (2024)
by: Cuttano, Claudia, et al.
Published: (2024)
A Second-Order Perspective on Pruning at Initialization and Knowledge Transfer
by: Iurada, Leonardo, et al.
Published: (2025)
by: Iurada, Leonardo, et al.
Published: (2025)
Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning
by: Iurada, Leonardo, et al.
Published: (2024)
by: Iurada, Leonardo, et al.
Published: (2024)
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
by: Cuttano, Claudia, et al.
Published: (2024)
by: Cuttano, Claudia, et al.
Published: (2024)
The revenge of BiSeNet: Efficient Multi-Task Image Segmentation
by: Rosi, Gabriele, et al.
Published: (2024)
by: Rosi, Gabriele, et al.
Published: (2024)
Efficient Model Editing with Task-Localized Sparse Fine-tuning
by: Iurada, Leonardo, et al.
Published: (2025)
by: Iurada, Leonardo, et al.
Published: (2025)
Fixed External Cameras as Common Prior Maps for Active 3D Scene Graph Generation
by: Modi, Giorgia, et al.
Published: (2026)
by: Modi, Giorgia, et al.
Published: (2026)
RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots
by: Modi, Giorgia, et al.
Published: (2026)
by: Modi, Giorgia, et al.
Published: (2026)
MultiGraspNet: A Multitask 3D Vision Model for Multi-gripper Robotic Grasping
by: Ortuno-Chanelo, Stephany, et al.
Published: (2026)
by: Ortuno-Chanelo, Stephany, et al.
Published: (2026)
MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware
by: Palanisamy, Senthil, et al.
Published: (2026)
by: Palanisamy, Senthil, et al.
Published: (2026)
Your ViT is Secretly an Image Segmentation Model
by: Kerssies, Tommie, et al.
Published: (2025)
by: Kerssies, Tommie, et al.
Published: (2025)
AI-driven visual monitoring of industrial assembly tasks
by: Nardon, Mattia, et al.
Published: (2025)
by: Nardon, Mattia, et al.
Published: (2025)
HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation
by: Birlo, Manuel, et al.
Published: (2024)
by: Birlo, Manuel, et al.
Published: (2024)
Open and reusable deep learning for pathology with WSInfer and QuPath
by: Kaczmarzyk, Jakub R., et al.
Published: (2023)
by: Kaczmarzyk, Jakub R., et al.
Published: (2023)
A generalizable foundation model for intraoperative understanding across surgical procedures
by: Park, Kanggil, et al.
Published: (2026)
by: Park, Kanggil, et al.
Published: (2026)
An Outlook into the Future of Egocentric Vision
by: Plizzari, Chiara, et al.
Published: (2023)
by: Plizzari, Chiara, et al.
Published: (2023)
ViSTa Dataset: Do vision-language models understand sequential tasks?
by: Wybitul, Evžen, et al.
Published: (2024)
by: Wybitul, Evžen, et al.
Published: (2024)
DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning
by: Bo, Zeyi, et al.
Published: (2024)
by: Bo, Zeyi, et al.
Published: (2024)
Intuitive physics understanding emerges from self-supervised pretraining on natural videos
by: Garrido, Quentin, et al.
Published: (2025)
by: Garrido, Quentin, et al.
Published: (2025)
Did you just see that? Arbitrary view synthesis for egocentric replay of operating room workflows from ambient sensors
by: Zhang, Han, et al.
Published: (2025)
by: Zhang, Han, et al.
Published: (2025)
Do generative video models understand physical principles?
by: Motamed, Saman, et al.
Published: (2025)
by: Motamed, Saman, et al.
Published: (2025)
Multi-step manipulation task and motion planning guided by video demonstration
by: Zorina, Kateryna, et al.
Published: (2025)
by: Zorina, Kateryna, et al.
Published: (2025)
GazeBehavior Annotation Toolkit (GBAT): AI-powered toolkit for automatic annotation of egocentric eye-tracking and video data of child-caregiver interaction
by: Baig, Iba, et al.
Published: (2026)
by: Baig, Iba, et al.
Published: (2026)
Benchmarking transferability of SSL pretraining to same and different modality segmentation tasks
by: Jiang, Jue, et al.
Published: (2026)
by: Jiang, Jue, et al.
Published: (2026)
Similar Items
-
HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos
by: Peirone, Simone Alberto, et al.
Published: (2025) -
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives
by: Peirone, Simone Alberto, et al.
Published: (2025) -
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives
by: Peirone, Simone Alberto, et al.
Published: (2024) -
FORESCENE: FOREcasting human activity via latent SCENE graphs diffusion
by: Alliegro, Antonio, et al.
Published: (2025) -
HiERO-StepG @ Ego4D Step Grounding Challenge: hierarchical activity understanding enables zero-shot step grounding
by: Zenotto, Andrea, et al.
Published: (2026)