:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Peirone, Simone Alberto, Pistilli, Francesca, Alliegro, Antonio, Averta, Giuseppe
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2403.03037
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives
by: Peirone, Simone Alberto, et al.
Published: (2025)

Learning reusable concepts across different egocentric video understanding tasks
by: Peirone, Simone Alberto, et al.
Published: (2025)

HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos
by: Peirone, Simone Alberto, et al.
Published: (2025)

FORESCENE: FOREcasting human activity via latent SCENE graphs diffusion
by: Alliegro, Antonio, et al.
Published: (2025)

HiERO-StepG @ Ego4D Step Grounding Challenge: hierarchical activity understanding enables zero-shot step grounding
by: Zenotto, Andrea, et al.
Published: (2026)

Egocentric zone-aware action recognition across environments
by: Peirone, Simone Alberto, et al.
Published: (2024)

Domain Generalization using Action Sequences for Egocentric Action Recognition
by: Nasirimajd, Amirshayan, et al.
Published: (2025)

Minerva-Ego: Spatiotemporal Hints for Egocentric Video Understanding
by: Nagrani, Arsha, et al.
Published: (2026)

Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
by: Wu, Tz-Ying, et al.
Published: (2024)

PEM: Prototype-based Efficient MaskFormer for Image Segmentation
by: Cavagnero, Niccolò, et al.
Published: (2024)

FEEL (Force-Enhanced Egocentric Learning): A Dataset for Physical Action Understanding
by: Dessalene, Eadom, et al.
Published: (2026)

LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
by: Wang, Ying, et al.
Published: (2023)

Ego4OOD: Rethinking Egocentric Video Domain Generalization via Covariate Shift Scoring
by: Vaseqi, Zahra, et al.
Published: (2026)

Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos
by: Yang, Yanlai, et al.
Published: (2025)

EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video
by: Hoque, Ryan, et al.
Published: (2025)

Gradient Similarity Surgery in Multi-Task Deep Learning
by: Borsani, Thomas, et al.
Published: (2025)

$\infty$-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation
by: Santos, Saul, et al.
Published: (2025)

3D-Aware Instance Segmentation and Tracking in Egocentric Videos
by: Bhalgat, Yash, et al.
Published: (2024)

Being-H0.7: A Latent World-Action Model from Egocentric Videos
by: Luo, Hao, et al.
Published: (2026)

Understanding-Enhanced Model Collaboration for Long-Tailed Egocentric Mistake Detection
by: Han, Boyu, et al.
Published: (2026)

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
by: Dang, Ronghao, et al.
Published: (2025)

MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
by: Ye, Hanrong, et al.
Published: (2024)

What to Do Next? Memorizing skills from Egocentric Instructional Video
by: Bi, Jing, et al.
Published: (2025)

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
by: Liang, Zhixuan, et al.
Published: (2023)

Whole-Body Conditioned Egocentric Video Prediction
by: Bai, Yutong, et al.
Published: (2025)

Understanding Domain Generalization: A Noise Robustness Perspective
by: Qiao, Rui, et al.
Published: (2024)

EgoMAGIC- An Egocentric Video Field Medicine Dataset for Training Perception Algorithms
by: VanVoorst, Brian, et al.
Published: (2026)

Online Video Understanding: OVBench and VideoChat-Online
by: Huang, Zhenpeng, et al.
Published: (2024)

The revenge of BiSeNet: Efficient Multi-Task Image Segmentation
by: Rosi, Gabriele, et al.
Published: (2024)

Video Parallel Scaling: Aggregating Diverse Frame Subsets for VideoLLMs
by: Chung, Hyungjin, et al.
Published: (2025)

Understanding Task Transfer in Vision-Language Models
by: Sachdeva, Bhuvan, et al.
Published: (2025)

PEDESTRIAN: An Egocentric Vision Dataset for Obstacle Detection on Pavements
by: Thoma, Marios, et al.
Published: (2025)

EgoSurgery-HTS: A Dataset for Egocentric Hand-Tool Segmentation in Open Surgery Videos
by: Darjana, Nathan, et al.
Published: (2025)

EasyVideoR1: Easier RL for Video Understanding
by: Qin, Chuanyu, et al.
Published: (2026)

Towards Sparse Video Understanding and Reasoning
by: Xu, Chenwei, et al.
Published: (2026)

Agentic Very Long Video Understanding
by: Rege, Aniket, et al.
Published: (2026)

EgoCogNav: Cognition-aware Human Egocentric Navigation
by: Qiu, Zhiwen, et al.
Published: (2025)

Benchmarking Egocentric Multimodal Goal Inference for Assistive Wearable Agents
by: Veerabadran, Vijay, et al.
Published: (2025)

EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos
by: Fujii, Ryo, et al.
Published: (2024)

Advancing Egocentric Video Question Answering with Multimodal Large Language Models
by: Patel, Alkesh, et al.
Published: (2025)