:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Peirone, Simone Alberto, Pistilli, Francesca, Alliegro, Antonio, Tommasi, Tatiana, Averta, Giuseppe
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2505.24690
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos
by: Peirone, Simone Alberto, et al.
Published: (2025)

Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives
by: Peirone, Simone Alberto, et al.
Published: (2025)

A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives
by: Peirone, Simone Alberto, et al.
Published: (2024)

FORESCENE: FOREcasting human activity via latent SCENE graphs diffusion
by: Alliegro, Antonio, et al.
Published: (2025)

HiERO-StepG @ Ego4D Step Grounding Challenge: hierarchical activity understanding enables zero-shot step grounding
by: Zenotto, Andrea, et al.
Published: (2026)

Egocentric zone-aware action recognition across environments
by: Peirone, Simone Alberto, et al.
Published: (2024)

Domain Generalization using Action Sequences for Egocentric Action Recognition
by: Nasirimajd, Amirshayan, et al.
Published: (2025)

PEM: Prototype-based Efficient MaskFormer for Image Segmentation
by: Cavagnero, Niccolò, et al.
Published: (2024)

Transient Fault Tolerant Semantic Segmentation for Autonomous Driving
by: Iurada, Leonardo, et al.
Published: (2024)

AMEGO: Active Memory from long EGOcentric videos
by: Goletto, Gabriele, et al.
Published: (2024)

Cross-Domain Transfer Learning with CoRTe: Consistent and Reliable Transfer from Black-Box to Lightweight Segmentation Model
by: Cuttano, Claudia, et al.
Published: (2024)

A Modern Take on Visual Relationship Reasoning for Grasp Planning
by: Rabino, Paolo, et al.
Published: (2024)

MaskPlanner: Learning-Based Object-Centric Motion Generation from 3D Point Clouds
by: Tiboni, Gabriele, et al.
Published: (2025)

The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences
by: Long, Bria, et al.
Published: (2024)

Efficient Odd-One-Out Anomaly Detection
by: Chito, Silvio, et al.
Published: (2025)

SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation
by: Cuttano, Claudia, et al.
Published: (2025)

What does CLIP know about peeling a banana?
by: Cuttano, Claudia, et al.
Published: (2024)

A Second-Order Perspective on Pruning at Initialization and Knowledge Transfer
by: Iurada, Leonardo, et al.
Published: (2025)

Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning
by: Iurada, Leonardo, et al.
Published: (2024)

SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
by: Cuttano, Claudia, et al.
Published: (2024)

The revenge of BiSeNet: Efficient Multi-Task Image Segmentation
by: Rosi, Gabriele, et al.
Published: (2024)

Efficient Model Editing with Task-Localized Sparse Fine-tuning
by: Iurada, Leonardo, et al.
Published: (2025)

Fixed External Cameras as Common Prior Maps for Active 3D Scene Graph Generation
by: Modi, Giorgia, et al.
Published: (2026)

RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots
by: Modi, Giorgia, et al.
Published: (2026)

MultiGraspNet: A Multitask 3D Vision Model for Multi-gripper Robotic Grasping
by: Ortuno-Chanelo, Stephany, et al.
Published: (2026)

MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware
by: Palanisamy, Senthil, et al.
Published: (2026)

Your ViT is Secretly an Image Segmentation Model
by: Kerssies, Tommie, et al.
Published: (2025)

AI-driven visual monitoring of industrial assembly tasks
by: Nardon, Mattia, et al.
Published: (2025)

HUP-3D: A 3D multi-view synthetic dataset for assisted-egocentric hand-ultrasound pose estimation
by: Birlo, Manuel, et al.
Published: (2024)

Open and reusable deep learning for pathology with WSInfer and QuPath
by: Kaczmarzyk, Jakub R., et al.
Published: (2023)

A generalizable foundation model for intraoperative understanding across surgical procedures
by: Park, Kanggil, et al.
Published: (2026)

An Outlook into the Future of Egocentric Vision
by: Plizzari, Chiara, et al.
Published: (2023)

ViSTa Dataset: Do vision-language models understand sequential tasks?
by: Wybitul, Evžen, et al.
Published: (2024)

DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning
by: Bo, Zeyi, et al.
Published: (2024)

Intuitive physics understanding emerges from self-supervised pretraining on natural videos
by: Garrido, Quentin, et al.
Published: (2025)

Did you just see that? Arbitrary view synthesis for egocentric replay of operating room workflows from ambient sensors
by: Zhang, Han, et al.
Published: (2025)

Do generative video models understand physical principles?
by: Motamed, Saman, et al.
Published: (2025)

Multi-step manipulation task and motion planning guided by video demonstration
by: Zorina, Kateryna, et al.
Published: (2025)

GazeBehavior Annotation Toolkit (GBAT): AI-powered toolkit for automatic annotation of egocentric eye-tracking and video data of child-caregiver interaction
by: Baig, Iba, et al.
Published: (2026)

Benchmarking transferability of SSL pretraining to same and different modality segmentation tasks
by: Jiang, Jue, et al.
Published: (2026)