Saved in:
Bibliographic Details
Main Authors: Luo, Haozheng, Qin, Ruiyang, Xu, Chenwei, Ye, Guo, Luo, Zening
Format: Preprint
Published: 2020
Subjects:
Online Access:https://arxiv.org/abs/2012.00822
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909220520067072
author Luo, Haozheng
Qin, Ruiyang
Xu, Chenwei
Ye, Guo
Luo, Zening
author_facet Luo, Haozheng
Qin, Ruiyang
Xu, Chenwei
Ye, Guo
Luo, Zening
contents In this paper, we introduce a robotic agent specifically designed to analyze external environments and address participants' questions. The primary focus of this agent is to assist individuals using language-based interactions within video-based scenes. Our proposed method integrates video recognition technology and natural language processing models within the robotic agent. We investigate the crucial factors affecting human-robot interactions by examining pertinent issues arising between participants and robot agents. Methodologically, our experimental findings reveal a positive relationship between trust and interaction efficiency. Furthermore, our model demonstrates a 2\% to 3\% performance enhancement in comparison to other benchmark methods.
format Preprint
id arxiv_https___arxiv_org_abs_2012_00822
institution arXiv
publishDate 2020
record_format arxiv
spellingShingle Open-Ended Multi-Modal Relational Reasoning for Video Question Answering
Luo, Haozheng
Qin, Ruiyang
Xu, Chenwei
Ye, Guo
Luo, Zening
Artificial Intelligence
Human-Computer Interaction
Robotics
In this paper, we introduce a robotic agent specifically designed to analyze external environments and address participants' questions. The primary focus of this agent is to assist individuals using language-based interactions within video-based scenes. Our proposed method integrates video recognition technology and natural language processing models within the robotic agent. We investigate the crucial factors affecting human-robot interactions by examining pertinent issues arising between participants and robot agents. Methodologically, our experimental findings reveal a positive relationship between trust and interaction efficiency. Furthermore, our model demonstrates a 2\% to 3\% performance enhancement in comparison to other benchmark methods.
title Open-Ended Multi-Modal Relational Reasoning for Video Question Answering
topic Artificial Intelligence
Human-Computer Interaction
Robotics
url https://arxiv.org/abs/2012.00822