Saved in:
Bibliographic Details
Main Authors: Mehta, Naval Kishore, Arvind, Prasad, Shyam Sunder, Saurav, Sumeet, Singh, Sanjay
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2501.05108
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909452477661184
author Mehta, Naval Kishore
Arvind
Prasad, Shyam Sunder
Saurav, Sumeet
Singh, Sanjay
author_facet Mehta, Naval Kishore
Arvind
Prasad, Shyam Sunder
Saurav, Sumeet
Singh, Sanjay
contents Monitoring complex assembly processes is critical for maintaining productivity and ensuring compliance with assembly standards. However, variability in human actions and subjective task preferences complicate accurate task anticipation and guidance. To address these challenges, we introduce the Multi-Modal Transformer Fusion and Recurrent Units (MMTFRU) Network for egocentric activity anticipation, utilizing multimodal fusion to improve prediction accuracy. Integrated with the Operator Action Monitoring Unit (OAMU), the system provides proactive operator guidance, preventing deviations in the assembly process. OAMU employs two strategies: (1) Top-5 MMTF-RU predictions, combined with a reference graph and an action dictionary, for next-step recommendations; and (2) Top-1 MMTF-RU predictions, integrated with a reference graph, for detecting sequence deviations and predicting anomaly scores via an entropy-informed confidence mechanism. We also introduce Time-Weighted Sequence Accuracy (TWSA) to evaluate operator efficiency and ensure timely task completion. Our approach is validated on the industrial Meccano dataset and the largescale EPIC-Kitchens-55 dataset, demonstrating its effectiveness in dynamic environments.
format Preprint
id arxiv_https___arxiv_org_abs_2501_05108
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Optimizing Multitask Industrial Processes with Predictive Action Guidance
Mehta, Naval Kishore
Arvind
Prasad, Shyam Sunder
Saurav, Sumeet
Singh, Sanjay
Computer Vision and Pattern Recognition
Monitoring complex assembly processes is critical for maintaining productivity and ensuring compliance with assembly standards. However, variability in human actions and subjective task preferences complicate accurate task anticipation and guidance. To address these challenges, we introduce the Multi-Modal Transformer Fusion and Recurrent Units (MMTFRU) Network for egocentric activity anticipation, utilizing multimodal fusion to improve prediction accuracy. Integrated with the Operator Action Monitoring Unit (OAMU), the system provides proactive operator guidance, preventing deviations in the assembly process. OAMU employs two strategies: (1) Top-5 MMTF-RU predictions, combined with a reference graph and an action dictionary, for next-step recommendations; and (2) Top-1 MMTF-RU predictions, integrated with a reference graph, for detecting sequence deviations and predicting anomaly scores via an entropy-informed confidence mechanism. We also introduce Time-Weighted Sequence Accuracy (TWSA) to evaluate operator efficiency and ensure timely task completion. Our approach is validated on the industrial Meccano dataset and the largescale EPIC-Kitchens-55 dataset, demonstrating its effectiveness in dynamic environments.
title Optimizing Multitask Industrial Processes with Predictive Action Guidance
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2501.05108