:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lin, Mingxian, Huang, Wei, Li, Yitang, Jiang, Chengjie, Wu, Kui, Zhong, Fangwei, Qian, Shengju, Wang, Xin, Qi, Xiaojuan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2507.10548
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models
by: Zhang, Zhikai, et al.
Published: (2024)

Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL
by: Zhong, Fangwei, et al.
Published: (2024)

EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models
by: Du, Mengfei, et al.
Published: (2024)

Hierarchical Instruction-aware Embodied Visual Tracking
by: Wu, Kui, et al.
Published: (2025)

Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
by: Zhang, Wenqi, et al.
Published: (2025)

TrackVLA++: Unleashing Reasoning and Memory Capabilities in VLA Models for Embodied Visual Tracking
by: Liu, Jiahang, et al.
Published: (2025)

UnrealZoo: Enriching Photo-realistic Virtual Worlds for Embodied AI
by: Zhong, Fangwei, et al.
Published: (2024)

VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models
by: Wu, Kui, et al.
Published: (2025)

RescueBench: Can Embodied Agents Save Lives in the Wild ?
by: Wu, Kui, et al.
Published: (2026)

TrackVLA: Embodied Visual Tracking in the Wild
by: Wang, Shaoan, et al.
Published: (2025)

Reinforced Context Order Recovery for Adaptive Reasoning and Planning
by: Ma, Long, et al.
Published: (2025)

ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
by: Li, Yifan, et al.
Published: (2025)

EnerVerse-AC: Envisioning Embodied Environments with Action Condition
by: Jiang, Yuxin, et al.
Published: (2025)

CoLT: Reasoning with Chain of Latent Tool Calls
by: Zhu, Fangwei, et al.
Published: (2026)

CoMet: Metaphor-Driven Covert Communication for Multi-Agent Language Games
by: Xu, Shuhang, et al.
Published: (2025)

Simulated Adoption: Decoupling Magnitude and Direction in LLM In-Context Conflict Resolution
by: Zhang, Long, et al.
Published: (2026)

V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models
by: Luo, Yang, et al.
Published: (2025)

See, Remember, Explore: A Benchmark and Baselines for Streaming Spatial Reasoning
by: Wei, Yuxi, et al.
Published: (2026)

EmbBERT: Attention Under 2 MB Memory
by: Bravin, Riccardo, et al.
Published: (2025)

EmbGen: Teaching with Reassembled Corpora
by: Lenin, Arun K, et al.
Published: (2026)

Make Interaction Situated: Designing User Acceptable Interaction for Situated Visualization in Public Environments
by: Zhu, Qian, et al.
Published: (2024)

Prompt Highlighter: Interactive Control for Multi-Modal LLMs
by: Zhang, Yuechen, et al.
Published: (2023)

Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models
by: Ling, Yiran, et al.
Published: (2026)

GraphPilot: GUI Task Automation with One-Step LLM Reasoning Powered by Knowledge Graph
by: Yu, Mingxian, et al.
Published: (2026)

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models
by: Zhou, Shengchao, et al.
Published: (2025)

Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning
by: White, Isadora, et al.
Published: (2025)

Pruning Minimal Reasoning Graphs for Efficient Retrieval-Augmented Generation
by: Wang, Ning, et al.
Published: (2026)

Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning
by: Sun, Qi, et al.
Published: (2024)

MindChat: Enhancing BCI Spelling with Large Language Models in Realistic Scenarios
by: Wang, JIaheng, et al.
Published: (2025)

World Action Models: The Next Frontier in Embodied AI
by: Wang, Siyin, et al.
Published: (2026)

DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action
by: Fang, Zhen, et al.
Published: (2025)

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
by: Yang, Ganlin, et al.
Published: (2025)

Text-Animator: Controllable Visual Text Video Generation
by: Liu, Lin, et al.
Published: (2024)

MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments
by: Liu, Yang, et al.
Published: (2024)

CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning
by: Li, Kailing, et al.
Published: (2025)

LLM Enhanced Action Recognition via Hierarchical Global-Local Skeleton-Language Model
by: Wang, Ruosi, et al.
Published: (2026)

TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation
by: Li, Jingyao, et al.
Published: (2023)

CoDiEmb: A Collaborative yet Distinct Framework for Unified Representation Learning in Information Retrieval and Semantic Textual Similarity
by: Zhang, Bowen, et al.
Published: (2025)

What Can Student-AI Dialogues Tell Us About Students' Self-Regulated Learning? An exploratory framework
by: Zhang, Long, et al.
Published: (2025)

Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning
by: Ganai, Milan, et al.
Published: (2026)