Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Song, Weixi, Chen, Zhetao, Xu, Tao, Zeng, Xianchao, Zhou, Xinyu, Yang, Lixin, Wang, Donglin, Lu, Cewu, Li, Yong-Lu
Format:	Preprint
Published:	2025
Subjects:	Robotics
Online Access:	https://arxiv.org/abs/2511.17898
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909917227515904
author	Song, Weixi Chen, Zhetao Xu, Tao Zeng, Xianchao Zhou, Xinyu Yang, Lixin Wang, Donglin Lu, Cewu Li, Yong-Lu
author_facet	Song, Weixi Chen, Zhetao Xu, Tao Zeng, Xianchao Zhou, Xinyu Yang, Lixin Wang, Donglin Lu, Cewu Li, Yong-Lu
contents	Denoising-based models, such as diffusion and flow matching, have been a critical component of robotic manipulation for their strong distribution-fitting and scaling capacity. Concurrently, several works have demonstrated that simple learning objectives, such as L1 regression, can achieve performance comparable to denoising-based methods on certain tasks, while offering faster convergence and inference. In this paper, we focus on how to combine the advantages of these two paradigms: retaining the ability of denoising models to capture multi-modal distributions and avoid mode collapse while achieving the efficiency of the L1 regression objective. To achieve this vision, we reformulate the original v-prediction flow matching and transform it into sample-prediction with the L1 training objective. We empirically show that the multi-modality can be expressed via a single ODE step. Thus, we propose \textbf{L1 Flow}, a two-step sampling schedule that generates a suboptimal action sequence via a single integration step and then reconstructs the precise action sequence through a single prediction. The proposed method largely retains the advantages of flow matching while reducing the iterative neural function evaluations to merely two and mitigating the potential performance degradation associated with direct sample regression. We evaluate our method with varying baselines and benchmarks, including 8 tasks in MimicGen, 5 tasks in RoboMimic \& PushT Bench, and one task in the real-world scenario. The results show the advantages of the proposed method with regard to training efficiency, inference speed, and overall performance. \href{https://song-wx.github.io/l1flow.github.io/}{Project Website.}
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_17898
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	L1 Sample Flow for Efficient Visuomotor Learning Song, Weixi Chen, Zhetao Xu, Tao Zeng, Xianchao Zhou, Xinyu Yang, Lixin Wang, Donglin Lu, Cewu Li, Yong-Lu Robotics Denoising-based models, such as diffusion and flow matching, have been a critical component of robotic manipulation for their strong distribution-fitting and scaling capacity. Concurrently, several works have demonstrated that simple learning objectives, such as L1 regression, can achieve performance comparable to denoising-based methods on certain tasks, while offering faster convergence and inference. In this paper, we focus on how to combine the advantages of these two paradigms: retaining the ability of denoising models to capture multi-modal distributions and avoid mode collapse while achieving the efficiency of the L1 regression objective. To achieve this vision, we reformulate the original v-prediction flow matching and transform it into sample-prediction with the L1 training objective. We empirically show that the multi-modality can be expressed via a single ODE step. Thus, we propose \textbf{L1 Flow}, a two-step sampling schedule that generates a suboptimal action sequence via a single integration step and then reconstructs the precise action sequence through a single prediction. The proposed method largely retains the advantages of flow matching while reducing the iterative neural function evaluations to merely two and mitigating the potential performance degradation associated with direct sample regression. We evaluate our method with varying baselines and benchmarks, including 8 tasks in MimicGen, 5 tasks in RoboMimic \& PushT Bench, and one task in the real-world scenario. The results show the advantages of the proposed method with regard to training efficiency, inference speed, and overall performance. \href{https://song-wx.github.io/l1flow.github.io/}{Project Website.}
title	L1 Sample Flow for Efficient Visuomotor Learning
topic	Robotics
url	https://arxiv.org/abs/2511.17898

Similar Items