Saved in:
Bibliographic Details
Main Authors: Gong, Xicheng, Li, Qiwei, Xu, Peiran, Mu, Yadong
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.25813
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917531619426304
author Gong, Xicheng
Li, Qiwei
Xu, Peiran
Mu, Yadong
author_facet Gong, Xicheng
Li, Qiwei
Xu, Peiran
Mu, Yadong
contents Embodied Question Answering (EQA) connects perception, reasoning, and interaction within embodied environments. However, existing datasets and benchmarks remain fragmented, each focusing on a limited subset of reasoning skills such as spatial understanding or procedural reasoning, without offering a unified large-scale framework for comprehensive evaluation. We present EQA-Decision, a large-scale embodied QA dataset that systematically covers four complementary dimensions of embodied reasoning: static scene construction, spatial understanding, task dynamics reasoning, and instant decision. The dataset contains over four million question-answer pairs with hierarchical annotations across diverse embodied scenarios. In addition, we develop RoboDecision, a strong baseline model aligned with the EQA-Decision Benchmark, providing a unified framework that jointly evaluates perception, reasoning, and action-level decision-making in embodied environments. Results demonstrate that EQA-Decision effectively benchmarks and enhances VLM capabilities in spatial and interaction reasoning, providing a solid foundation for advancing embodied intelligence research.
format Preprint
id arxiv_https___arxiv_org_abs_2605_25813
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Extending Embodied Question Answering from Perception to Decision
Gong, Xicheng
Li, Qiwei
Xu, Peiran
Mu, Yadong
Robotics
Embodied Question Answering (EQA) connects perception, reasoning, and interaction within embodied environments. However, existing datasets and benchmarks remain fragmented, each focusing on a limited subset of reasoning skills such as spatial understanding or procedural reasoning, without offering a unified large-scale framework for comprehensive evaluation. We present EQA-Decision, a large-scale embodied QA dataset that systematically covers four complementary dimensions of embodied reasoning: static scene construction, spatial understanding, task dynamics reasoning, and instant decision. The dataset contains over four million question-answer pairs with hierarchical annotations across diverse embodied scenarios. In addition, we develop RoboDecision, a strong baseline model aligned with the EQA-Decision Benchmark, providing a unified framework that jointly evaluates perception, reasoning, and action-level decision-making in embodied environments. Results demonstrate that EQA-Decision effectively benchmarks and enhances VLM capabilities in spatial and interaction reasoning, providing a solid foundation for advancing embodied intelligence research.
title Extending Embodied Question Answering from Perception to Decision
topic Robotics
url https://arxiv.org/abs/2605.25813