:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lando, Giuseppe, Forte, Rosario, Farinella, Giovanni Maria, Furnari, Antonino
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.16450
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exploring Multimodal LMMs for Online Episodic Memory Question Answering on the Edge
by: Lando, Giuseppe, et al.
Published: (2026)

EGOSTREAM: A Diagnostic Benchmark for Streaming Episodic Memory in Egocentric Vision
by: Forte, Rosario, et al.
Published: (2026)

Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory
by: Manigrasso, Zaira, et al.
Published: (2024)

Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
by: Seminara, Luigi, et al.
Published: (2024)

Exploiting Multimodal Synthetic Data for Egocentric Human-Object Interaction Detection in an Industrial Scenario
by: Leonardi, Rosario, et al.
Published: (2023)

Mamba-OTR: a Mamba-based Solution for Online Take and Release Detection from Untrimmed Egocentric Video
by: Catinello, Alessandro Sebastiano, et al.
Published: (2025)

Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos
by: Seminara, Luigi, et al.
Published: (2025)

Calisthenics Skills Temporal Video Segmentation
by: Finocchiaro, Antonio, et al.
Published: (2025)

Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation
by: Finocchiaro, Antonio, et al.
Published: (2025)

StillFast: An End-to-End Approach for Short-Term Object Interaction Anticipation
by: Ragusa, Francesco, et al.
Published: (2023)

Leveraging Synthetic Data for Enhancing Egocentric Hand-Object Interaction Detection
by: Leonardi, Rosario, et al.
Published: (2026)

Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?
by: Leonardi, Rosario, et al.
Published: (2023)

Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities
by: Mazzamuto, Michele, et al.
Published: (2024)

Semantically Guided Action Anticipation
by: Diko, Anxhelo, et al.
Published: (2024)

Ego-EXTRA: video-language Egocentric Dataset for EXpert-TRAinee assistance
by: Ragusa, Francesco, et al.
Published: (2025)

Ego-METAS: Egocentric online Multimodal Energy-efficient Temporal Action Segmentation benchmark
by: Santos-Villafranca, Maria, et al.
Published: (2026)

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains
by: Finocchiaro, Antonio, et al.
Published: (2025)

ProSkill: Segment-Level Skill Assessment in Procedural Videos
by: Mazzamuto, Michele, et al.
Published: (2026)

Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs
by: Quattrocchi, Camillo, et al.
Published: (2023)

AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation
by: Mur-Labadia, Lorenzo, et al.
Published: (2024)

Learning Egocentric In-Hand Object Segmentation through Weak Supervision from Human Narrations
by: Messina, Nicola, et al.
Published: (2025)

Integrating Affordances and Attention models for Short-Term Object Interaction Anticipation
by: Labadia, Lorenzo Mur, et al.
Published: (2026)

EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs
by: Rodin, Ivan, et al.
Published: (2025)

An Outlook into the Future of Egocentric Vision
by: Plizzari, Chiara, et al.
Published: (2023)

TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos
by: Plini, Leonardo, et al.
Published: (2024)

ENIGMA-360: An Ego-Exo Dataset for Human Behavior Understanding in Industrial Scenarios
by: Ragusa, Francesco, et al.
Published: (2026)

SignIT: A Comprehensive Dataset and Multimodal Analysis for Italian Sign Language Recognition
by: Micieli, Alessia, et al.
Published: (2025)

PREGO: online mistake detection in PRocedural EGOcentric videos
by: Flaborea, Alessandro, et al.
Published: (2024)

RECIPE: Procedural Planning via Grounding in Instructional Video
by: Seminara, Luigi, et al.
Published: (2026)

ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos
by: Seminara, Luigi, et al.
Published: (2026)

GlovEgo-HOI: Bridging the Synthetic-to-Real Gap for Industrial Egocentric Human-Object Interaction Detection
by: Spoto, Alfio, et al.
Published: (2026)

Advancing Egocentric Video Question Answering with Multimodal Large Language Models
by: Patel, Alkesh, et al.
Published: (2025)

Leveraging Gaze and Set-of-Mark in VLLMs for Human-Object Interaction Anticipation from Egocentric Videos
by: Materia, Daniele, et al.
Published: (2026)

MapIQ: Evaluating Multimodal Large Language Models for Map Question Answering
by: Srivastava, Varun, et al.
Published: (2025)

OS-MAP: How Far Can Computer-Using Agents Go in Breadth and Depth?
by: Chen, Xuetian, et al.
Published: (2025)

SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models
by: Taguchi, Shun, et al.
Published: (2025)

Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning
by: Zeng, Xingchen, et al.
Published: (2024)

Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks
by: Lee, Jusung, et al.
Published: (2024)

Porting Large Language Models to Mobile Devices for Question Answering
by: Fassold, Hannes
Published: (2024)

ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering
by: Wu, Yifan, et al.
Published: (2024)