:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Tianyu, Lin, Haitao, Yu, Junqiu, Fu, Yanwei
Format:	Preprint
Published:	2024
Subjects:	Robotics Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2408.07975
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

You Only Estimate Once: Unified, One-stage, Real-Time Category-level Articulated Object 6D Pose Estimation for Robotic Grasping
by: Huang, Jingshun, et al.
Published: (2025)

SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images
by: Yu, Junqiu, et al.
Published: (2024)

RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation
by: Liu, Fanfan, et al.
Published: (2024)

VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
by: Zhang, Shiduo, et al.
Published: (2024)

CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image
by: Huang, Jingshun, et al.
Published: (2025)

RoboOmni: Proactive Robot Manipulation in Omni-modal Context
by: Wang, Siyin, et al.
Published: (2025)

IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation
by: Lian, Shijie, et al.
Published: (2026)

ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning
by: Yang, Yandan, et al.
Published: (2026)

CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
by: Li, Qixiu, et al.
Published: (2024)

Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation
by: Werby, Abdelrhman, et al.
Published: (2024)

DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse Scenes
by: Wang, Zhaowei, et al.
Published: (2024)

Probing Collision Grounding in Vision-Language Models for Safe Human-Robot Collaboration
by: Wang, Jun, et al.
Published: (2026)

GenSim: Generating Robotic Simulation Tasks via Large Language Models
by: Wang, Lirui, et al.
Published: (2023)

RoboTrustBench: Benchmarking the Trustworthiness of Video World Models for Robotic Manipulation
by: Li, Huiqiong, et al.
Published: (2026)

How Can Large Language Models Enable Better Socially Assistive Human-Robot Interaction: A Brief Survey
by: Shi, Zhonghao, et al.
Published: (2024)

Virtual Community: An Open World for Humans, Robots, and Society
by: Zhou, Qinhong, et al.
Published: (2025)

A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding
by: Liu, Zhenyang, et al.
Published: (2025)

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
by: Huang, Haifeng, et al.
Published: (2025)

Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling
by: Yashima, Daichi, et al.
Published: (2024)

Physically Grounded Vision-Language Models for Robotic Manipulation
by: Gao, Jensen, et al.
Published: (2023)

LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion
by: Zhang, Jinyu, et al.
Published: (2024)

ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models
by: Darabi, Nastaran, et al.
Published: (2026)

Signs of Language: Embodied Sign Language Fingerspelling Acquisition from Demonstrations for Human-Robot Interaction
by: Tavella, Federico, et al.
Published: (2022)

EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models
by: Bai, Yu, et al.
Published: (2026)

Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
by: Zhou, Jiaming, et al.
Published: (2024)

Semantic-Drive: Democratizing Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus
by: Guillen-Perez, Antonio
Published: (2025)

SynHLMA:Synthesizing Hand Language Manipulation for Articulated Object with Discrete Human Object Interaction Representation
by: zhi, Wang, et al.
Published: (2025)

A Superalignment Framework in Autonomous Driving with Large Language Models
by: Kong, Xiangrui, et al.
Published: (2024)

Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
by: Chen, Yi, et al.
Published: (2024)

Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents
by: Arjmand, Mehdi, et al.
Published: (2024)

Universal Pose Pretraining for Generalizable Vision-Language-Action Policies
by: Lin, Haitao, et al.
Published: (2026)

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
by: Duan, Jiafei, et al.
Published: (2024)

Agreeing to Interact in Human-Robot Interaction using Large Language Models and Vision Language Models
by: Sasabuchi, Kazuhiro, et al.
Published: (2025)

PSALM-V: Automating Symbolic Planning in Interactive Visual Environments with Large Language Models
by: Zhu, Wang Bill, et al.
Published: (2025)

HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction
by: Shi, Zhonghao, et al.
Published: (2025)

RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
by: Song, Chan Hee, et al.
Published: (2024)

Leveraging Large Language Models in Human-Robot Interaction: A Critical Analysis of Potential and Pitfalls
by: Atuhurra, Jesse
Published: (2024)

EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
by: Chen, Yi, et al.
Published: (2023)

REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation
by: Yuan, Puzhen, et al.
Published: (2025)

From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation
by: Liu, Yibin, et al.
Published: (2026)