Saved in:
| Main Authors: | Wang, Tianyu, Lin, Haitao, Yu, Junqiu, Fu, Yanwei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.07975 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
You Only Estimate Once: Unified, One-stage, Real-Time Category-level Articulated Object 6D Pose Estimation for Robotic Grasping
by: Huang, Jingshun, et al.
Published: (2025)
by: Huang, Jingshun, et al.
Published: (2025)
SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images
by: Yu, Junqiu, et al.
Published: (2024)
by: Yu, Junqiu, et al.
Published: (2024)
RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation
by: Liu, Fanfan, et al.
Published: (2024)
by: Liu, Fanfan, et al.
Published: (2024)
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
by: Zhang, Shiduo, et al.
Published: (2024)
by: Zhang, Shiduo, et al.
Published: (2024)
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image
by: Huang, Jingshun, et al.
Published: (2025)
by: Huang, Jingshun, et al.
Published: (2025)
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
by: Wang, Siyin, et al.
Published: (2025)
by: Wang, Siyin, et al.
Published: (2025)
IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation
by: Lian, Shijie, et al.
Published: (2026)
by: Lian, Shijie, et al.
Published: (2026)
ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning
by: Yang, Yandan, et al.
Published: (2026)
by: Yang, Yandan, et al.
Published: (2026)
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
by: Li, Qixiu, et al.
Published: (2024)
by: Li, Qixiu, et al.
Published: (2024)
Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation
by: Werby, Abdelrhman, et al.
Published: (2024)
by: Werby, Abdelrhman, et al.
Published: (2024)
DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse Scenes
by: Wang, Zhaowei, et al.
Published: (2024)
by: Wang, Zhaowei, et al.
Published: (2024)
Probing Collision Grounding in Vision-Language Models for Safe Human-Robot Collaboration
by: Wang, Jun, et al.
Published: (2026)
by: Wang, Jun, et al.
Published: (2026)
GenSim: Generating Robotic Simulation Tasks via Large Language Models
by: Wang, Lirui, et al.
Published: (2023)
by: Wang, Lirui, et al.
Published: (2023)
RoboTrustBench: Benchmarking the Trustworthiness of Video World Models for Robotic Manipulation
by: Li, Huiqiong, et al.
Published: (2026)
by: Li, Huiqiong, et al.
Published: (2026)
How Can Large Language Models Enable Better Socially Assistive Human-Robot Interaction: A Brief Survey
by: Shi, Zhonghao, et al.
Published: (2024)
by: Shi, Zhonghao, et al.
Published: (2024)
Virtual Community: An Open World for Humans, Robots, and Society
by: Zhou, Qinhong, et al.
Published: (2025)
by: Zhou, Qinhong, et al.
Published: (2025)
A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding
by: Liu, Zhenyang, et al.
Published: (2025)
by: Liu, Zhenyang, et al.
Published: (2025)
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
by: Huang, Haifeng, et al.
Published: (2025)
by: Huang, Haifeng, et al.
Published: (2025)
Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling
by: Yashima, Daichi, et al.
Published: (2024)
by: Yashima, Daichi, et al.
Published: (2024)
Physically Grounded Vision-Language Models for Robotic Manipulation
by: Gao, Jensen, et al.
Published: (2023)
by: Gao, Jensen, et al.
Published: (2023)
LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion
by: Zhang, Jinyu, et al.
Published: (2024)
by: Zhang, Jinyu, et al.
Published: (2024)
ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models
by: Darabi, Nastaran, et al.
Published: (2026)
by: Darabi, Nastaran, et al.
Published: (2026)
Signs of Language: Embodied Sign Language Fingerspelling Acquisition from Demonstrations for Human-Robot Interaction
by: Tavella, Federico, et al.
Published: (2022)
by: Tavella, Federico, et al.
Published: (2022)
EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models
by: Bai, Yu, et al.
Published: (2026)
by: Bai, Yu, et al.
Published: (2026)
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
by: Zhou, Jiaming, et al.
Published: (2024)
by: Zhou, Jiaming, et al.
Published: (2024)
Semantic-Drive: Democratizing Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus
by: Guillen-Perez, Antonio
Published: (2025)
by: Guillen-Perez, Antonio
Published: (2025)
SynHLMA:Synthesizing Hand Language Manipulation for Articulated Object with Discrete Human Object Interaction Representation
by: zhi, Wang, et al.
Published: (2025)
by: zhi, Wang, et al.
Published: (2025)
A Superalignment Framework in Autonomous Driving with Large Language Models
by: Kong, Xiangrui, et al.
Published: (2024)
by: Kong, Xiangrui, et al.
Published: (2024)
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
by: Chen, Yi, et al.
Published: (2024)
by: Chen, Yi, et al.
Published: (2024)
Empathic Grounding: Explorations using Multimodal Interaction and Large Language Models with Conversational Agents
by: Arjmand, Mehdi, et al.
Published: (2024)
by: Arjmand, Mehdi, et al.
Published: (2024)
Universal Pose Pretraining for Generalizable Vision-Language-Action Policies
by: Lin, Haitao, et al.
Published: (2026)
by: Lin, Haitao, et al.
Published: (2026)
Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
by: Duan, Jiafei, et al.
Published: (2024)
by: Duan, Jiafei, et al.
Published: (2024)
Agreeing to Interact in Human-Robot Interaction using Large Language Models and Vision Language Models
by: Sasabuchi, Kazuhiro, et al.
Published: (2025)
by: Sasabuchi, Kazuhiro, et al.
Published: (2025)
PSALM-V: Automating Symbolic Planning in Interactive Visual Environments with Large Language Models
by: Zhu, Wang Bill, et al.
Published: (2025)
by: Zhu, Wang Bill, et al.
Published: (2025)
HRIBench: Benchmarking Vision-Language Models for Real-Time Human Perception in Human-Robot Interaction
by: Shi, Zhonghao, et al.
Published: (2025)
by: Shi, Zhonghao, et al.
Published: (2025)
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
by: Song, Chan Hee, et al.
Published: (2024)
by: Song, Chan Hee, et al.
Published: (2024)
Leveraging Large Language Models in Human-Robot Interaction: A Critical Analysis of Potential and Pitfalls
by: Atuhurra, Jesse
Published: (2024)
by: Atuhurra, Jesse
Published: (2024)
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
by: Chen, Yi, et al.
Published: (2023)
by: Chen, Yi, et al.
Published: (2023)
REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation
by: Yuan, Puzhen, et al.
Published: (2025)
by: Yuan, Puzhen, et al.
Published: (2025)
From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation
by: Liu, Yibin, et al.
Published: (2026)
by: Liu, Yibin, et al.
Published: (2026)
Similar Items
-
You Only Estimate Once: Unified, One-stage, Real-Time Category-level Articulated Object 6D Pose Estimation for Robotic Grasping
by: Huang, Jingshun, et al.
Published: (2025) -
SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images
by: Yu, Junqiu, et al.
Published: (2024) -
RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation
by: Liu, Fanfan, et al.
Published: (2024) -
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
by: Zhang, Shiduo, et al.
Published: (2024) -
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image
by: Huang, Jingshun, et al.
Published: (2025)