Saved in:
Bibliographic Details
Main Authors: Truongcao, Keith, Nhu, Christopher, An, Zijian, Nguyen, Phong, Cai, Siwei, Zhou, Lifeng
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2606.00966
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914620617261056
author Truongcao, Keith
Nhu, Christopher
An, Zijian
Nguyen, Phong
Cai, Siwei
Zhou, Lifeng
author_facet Truongcao, Keith
Nhu, Christopher
An, Zijian
Nguyen, Phong
Cai, Siwei
Zhou, Lifeng
contents Vision-Language Action (VLA) models continue to face challenges such as slow inference speed and difficulty performing fine-grained motion adjustments, limiting their widespread adoption in industry. While the Real-Time Action Chunking (RTAC) algorithm has been proposed to address these bottlenecks, bridging the gap between the algorithm provided in pseudocode to a stable, real-world deployment on a low-cost robotic arm remains a challenge. In this work, we present a complete system-level implementation of RTAC tailored for a low-cost robotic manipulation system. We advance beyond the original high-level pseudocode by optimizing the threading implementation for the policy inference and control pipeline, reducing end-to-end latency and improving responsiveness without modifying the underlying policy. We evaluate this system on tasks involving the manipulation of agricultural produce, specifically garlic bulbs and walnuts. Experimental results demonstrate that our custom threading implementation significantly improves control stability and speed compared to the base implementation of RTAC.
format Preprint
id arxiv_https___arxiv_org_abs_2606_00966
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Threading Optimization for Vision-Language-Action Model Inference in Low-Cost Smart Agricultural Manipulation
Truongcao, Keith
Nhu, Christopher
An, Zijian
Nguyen, Phong
Cai, Siwei
Zhou, Lifeng
Robotics
Vision-Language Action (VLA) models continue to face challenges such as slow inference speed and difficulty performing fine-grained motion adjustments, limiting their widespread adoption in industry. While the Real-Time Action Chunking (RTAC) algorithm has been proposed to address these bottlenecks, bridging the gap between the algorithm provided in pseudocode to a stable, real-world deployment on a low-cost robotic arm remains a challenge. In this work, we present a complete system-level implementation of RTAC tailored for a low-cost robotic manipulation system. We advance beyond the original high-level pseudocode by optimizing the threading implementation for the policy inference and control pipeline, reducing end-to-end latency and improving responsiveness without modifying the underlying policy. We evaluate this system on tasks involving the manipulation of agricultural produce, specifically garlic bulbs and walnuts. Experimental results demonstrate that our custom threading implementation significantly improves control stability and speed compared to the base implementation of RTAC.
title Threading Optimization for Vision-Language-Action Model Inference in Low-Cost Smart Agricultural Manipulation
topic Robotics
url https://arxiv.org/abs/2606.00966