Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Kumar, Sathish, Damodaran, Swaroop, Kuruba, Naveen Kumar, Jha, Sumit, Ramanathan, Arvind
Formato:	Preprint
Publicado:	2025
Materias:	Machine Learning Robotics
Acceso en línea:	https://arxiv.org/abs/2504.03423
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866912309581971456
author	Kumar, Sathish Damodaran, Swaroop Kuruba, Naveen Kumar Jha, Sumit Ramanathan, Arvind
author_facet	Kumar, Sathish Damodaran, Swaroop Kuruba, Naveen Kumar Jha, Sumit Ramanathan, Arvind
contents	This paper presents a novel deep learning framework for robotic arm manipulation that integrates multimodal inputs using a late-fusion strategy. Unlike traditional end-to-end or reinforcement learning approaches, our method processes image sequences with pre-trained models and robot state data with machine learning algorithms, fusing their outputs to predict continuous action values for control. Evaluated on BridgeData V2 and Kuka datasets, the best configuration (VGG16 + Random Forest) achieved MSEs of 0.0021 and 0.0028, respectively, demonstrating strong predictive performance and robustness. The framework supports modularity, interpretability, and real-time decision-making, aligning with the goals of adaptive, human-in-the-loop cyber-physical systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_03423
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models Kumar, Sathish Damodaran, Swaroop Kuruba, Naveen Kumar Jha, Sumit Ramanathan, Arvind Machine Learning Robotics This paper presents a novel deep learning framework for robotic arm manipulation that integrates multimodal inputs using a late-fusion strategy. Unlike traditional end-to-end or reinforcement learning approaches, our method processes image sequences with pre-trained models and robot state data with machine learning algorithms, fusing their outputs to predict continuous action values for control. Evaluated on BridgeData V2 and Kuka datasets, the best configuration (VGG16 + Random Forest) achieved MSEs of 0.0021 and 0.0028, respectively, demonstrating strong predictive performance and robustness. The framework supports modularity, interpretability, and real-time decision-making, aligning with the goals of adaptive, human-in-the-loop cyber-physical systems.
title	DML-RAM: Deep Multimodal Learning Framework for Robotic Arm Manipulation using Pre-trained Models
topic	Machine Learning Robotics
url	https://arxiv.org/abs/2504.03423

Ejemplares similares