Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yang, Libing, Li, Yang, Chen, Long
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Robotics
Online Access:	https://arxiv.org/abs/2405.04549
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910438023757824
author	Yang, Libing Li, Yang Chen, Long
author_facet	Yang, Libing Li, Yang Chen, Long
contents	Vision-based robotic cloth unfolding has made great progress recently. However, prior works predominantly rely on value learning and have not fully explored policy-based techniques. Recently, the success of reinforcement learning on the large language model has shown that the policy gradient algorithm can enhance policy with huge action space. In this paper, we introduce ClothPPO, a framework that employs a policy gradient algorithm based on actor-critic architecture to enhance a pre-trained model with huge 10^6 action spaces aligned with observation in the task of unfolding clothes. To this end, we redefine the cloth manipulation problem as a partially observable Markov decision process. A supervised pre-training stage is employed to train a baseline model of our policy. In the second stage, the Proximal Policy Optimization (PPO) is utilized to guide the supervised model within the observation-aligned action space. By optimizing and updating the strategy, our proposed method increases the garment's surface area for cloth unfolding under the soft-body manipulation task. Experimental results show that our proposed framework can further improve the unfolding performance of other state-of-the-art methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_04549
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	ClothPPO: A Proximal Policy Optimization Enhancing Framework for Robotic Cloth Manipulation with Observation-Aligned Action Spaces Yang, Libing Li, Yang Chen, Long Computer Vision and Pattern Recognition Artificial Intelligence Robotics Vision-based robotic cloth unfolding has made great progress recently. However, prior works predominantly rely on value learning and have not fully explored policy-based techniques. Recently, the success of reinforcement learning on the large language model has shown that the policy gradient algorithm can enhance policy with huge action space. In this paper, we introduce ClothPPO, a framework that employs a policy gradient algorithm based on actor-critic architecture to enhance a pre-trained model with huge 10^6 action spaces aligned with observation in the task of unfolding clothes. To this end, we redefine the cloth manipulation problem as a partially observable Markov decision process. A supervised pre-training stage is employed to train a baseline model of our policy. In the second stage, the Proximal Policy Optimization (PPO) is utilized to guide the supervised model within the observation-aligned action space. By optimizing and updating the strategy, our proposed method increases the garment's surface area for cloth unfolding under the soft-body manipulation task. Experimental results show that our proposed framework can further improve the unfolding performance of other state-of-the-art methods.
title	ClothPPO: A Proximal Policy Optimization Enhancing Framework for Robotic Cloth Manipulation with Observation-Aligned Action Spaces
topic	Computer Vision and Pattern Recognition Artificial Intelligence Robotics
url	https://arxiv.org/abs/2405.04549

Similar Items