Saved in:
| Main Authors: | Lin, Mingxian, Huang, Wei, Li, Yitang, Jiang, Chengjie, Wu, Kui, Zhong, Fangwei, Qian, Shengju, Wang, Xin, Qi, Xiaojuan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.10548 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models
by: Zhang, Zhikai, et al.
Published: (2024)
by: Zhang, Zhikai, et al.
Published: (2024)
Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL
by: Zhong, Fangwei, et al.
Published: (2024)
by: Zhong, Fangwei, et al.
Published: (2024)
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models
by: Du, Mengfei, et al.
Published: (2024)
by: Du, Mengfei, et al.
Published: (2024)
Hierarchical Instruction-aware Embodied Visual Tracking
by: Wu, Kui, et al.
Published: (2025)
by: Wu, Kui, et al.
Published: (2025)
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
by: Zhang, Wenqi, et al.
Published: (2025)
by: Zhang, Wenqi, et al.
Published: (2025)
TrackVLA++: Unleashing Reasoning and Memory Capabilities in VLA Models for Embodied Visual Tracking
by: Liu, Jiahang, et al.
Published: (2025)
by: Liu, Jiahang, et al.
Published: (2025)
UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI
by: Zhong, Fangwei, et al.
Published: (2024)
by: Zhong, Fangwei, et al.
Published: (2024)
VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models
by: Wu, Kui, et al.
Published: (2025)
by: Wu, Kui, et al.
Published: (2025)
RescueBench: Can Embodied Agents Save Lives in the Wild ?
by: Wu, Kui, et al.
Published: (2026)
by: Wu, Kui, et al.
Published: (2026)
TrackVLA: Embodied Visual Tracking in the Wild
by: Wang, Shaoan, et al.
Published: (2025)
by: Wang, Shaoan, et al.
Published: (2025)
Reinforced Context Order Recovery for Adaptive Reasoning and Planning
by: Ma, Long, et al.
Published: (2025)
by: Ma, Long, et al.
Published: (2025)
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
by: Li, Yifan, et al.
Published: (2025)
by: Li, Yifan, et al.
Published: (2025)
EnerVerse-AC: Envisioning Embodied Environments with Action Condition
by: Jiang, Yuxin, et al.
Published: (2025)
by: Jiang, Yuxin, et al.
Published: (2025)
CoLT: Reasoning with Chain of Latent Tool Calls
by: Zhu, Fangwei, et al.
Published: (2026)
by: Zhu, Fangwei, et al.
Published: (2026)
CoMet: Metaphor-Driven Covert Communication for Multi-Agent Language Games
by: Xu, Shuhang, et al.
Published: (2025)
by: Xu, Shuhang, et al.
Published: (2025)
Simulated Adoption: Decoupling Magnitude and Direction in LLM In-Context Conflict Resolution
by: Zhang, Long, et al.
Published: (2026)
by: Zhang, Long, et al.
Published: (2026)
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models
by: Luo, Yang, et al.
Published: (2025)
by: Luo, Yang, et al.
Published: (2025)
See, Remember, Explore: A Benchmark and Baselines for Streaming Spatial Reasoning
by: Wei, Yuxi, et al.
Published: (2026)
by: Wei, Yuxi, et al.
Published: (2026)
EmbBERT: Attention Under 2 MB Memory
by: Bravin, Riccardo, et al.
Published: (2025)
by: Bravin, Riccardo, et al.
Published: (2025)
EmbGen: Teaching with Reassembled Corpora
by: Lenin, Arun K, et al.
Published: (2026)
by: Lenin, Arun K, et al.
Published: (2026)
Make Interaction Situated: Designing User Acceptable Interaction for Situated Visualization in Public Environments
by: Zhu, Qian, et al.
Published: (2024)
by: Zhu, Qian, et al.
Published: (2024)
Prompt Highlighter: Interactive Control for Multi-Modal LLMs
by: Zhang, Yuechen, et al.
Published: (2023)
by: Zhang, Yuechen, et al.
Published: (2023)
Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models
by: Ling, Yiran, et al.
Published: (2026)
by: Ling, Yiran, et al.
Published: (2026)
GraphPilot: GUI Task Automation with One-Step LLM Reasoning Powered by Knowledge Graph
by: Yu, Mingxian, et al.
Published: (2026)
by: Yu, Mingxian, et al.
Published: (2026)
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models
by: Zhou, Shengchao, et al.
Published: (2025)
by: Zhou, Shengchao, et al.
Published: (2025)
Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning
by: White, Isadora, et al.
Published: (2025)
by: White, Isadora, et al.
Published: (2025)
Pruning Minimal Reasoning Graphs for Efficient Retrieval-Augmented Generation
by: Wang, Ning, et al.
Published: (2026)
by: Wang, Ning, et al.
Published: (2026)
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning
by: Sun, Qi, et al.
Published: (2024)
by: Sun, Qi, et al.
Published: (2024)
MindChat: Enhancing BCI Spelling with Large Language Models in Realistic Scenarios
by: Wang, JIaheng, et al.
Published: (2025)
by: Wang, JIaheng, et al.
Published: (2025)
World Action Models: The Next Frontier in Embodied AI
by: Wang, Siyin, et al.
Published: (2026)
by: Wang, Siyin, et al.
Published: (2026)
DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action
by: Fang, Zhen, et al.
Published: (2025)
by: Fang, Zhen, et al.
Published: (2025)
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
by: Yang, Ganlin, et al.
Published: (2025)
by: Yang, Ganlin, et al.
Published: (2025)
Text-Animator: Controllable Visual Text Video Generation
by: Liu, Lin, et al.
Published: (2024)
by: Liu, Lin, et al.
Published: (2024)
MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments
by: Liu, Yang, et al.
Published: (2024)
by: Liu, Yang, et al.
Published: (2024)
CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning
by: Li, Kailing, et al.
Published: (2025)
by: Li, Kailing, et al.
Published: (2025)
LLM Enhanced Action Recognition via Hierarchical Global-Local Skeleton-Language Model
by: Wang, Ruosi, et al.
Published: (2026)
by: Wang, Ruosi, et al.
Published: (2026)
TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation
by: Li, Jingyao, et al.
Published: (2023)
by: Li, Jingyao, et al.
Published: (2023)
CoDiEmb: A Collaborative yet Distinct Framework for Unified Representation Learning in Information Retrieval and Semantic Textual Similarity
by: Zhang, Bowen, et al.
Published: (2025)
by: Zhang, Bowen, et al.
Published: (2025)
What Can Student-AI Dialogues Tell Us About Students' Self-Regulated Learning? An exploratory framework
by: Zhang, Long, et al.
Published: (2025)
by: Zhang, Long, et al.
Published: (2025)
Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning
by: Ganai, Milan, et al.
Published: (2026)
by: Ganai, Milan, et al.
Published: (2026)
Similar Items
-
FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models
by: Zhang, Zhikai, et al.
Published: (2024) -
Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL
by: Zhong, Fangwei, et al.
Published: (2024) -
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models
by: Du, Mengfei, et al.
Published: (2024) -
Hierarchical Instruction-aware Embodied Visual Tracking
by: Wu, Kui, et al.
Published: (2025) -
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
by: Zhang, Wenqi, et al.
Published: (2025)