Saved in:
Bibliographic Details
Main Authors: TRI LBM Team, Barreiros, Jose, Beaulieu, Andrew, Bhat, Aditya, Cory, Rick, Cousineau, Eric, Dai, Hongkai, Fang, Ching-Hsin, Hashimoto, Kunimatsu, Irshad, Muhammad Zubair, Itkina, Masha, Kuppuswamy, Naveen, Lee, Kuan-Hui, Liu, Katherine, McConachie, Dale, McMahon, Ian, Nishimura, Haruki, Phillips-Grafflin, Calder, Richter, Charles, Shah, Paarth, Srinivasan, Krishnan, Wulfe, Blake, Xu, Chen, Zhang, Mengchao, Alspach, Alex, Angeles, Maya, Arora, Kushal, Guizilini, Vitor Campagnolo, Castro, Alejandro, Chen, Dian, Chu, Ting-Sheng, Creasey, Sam, Curtis, Sean, Denitto, Richard, Dixon, Emma, Dusel, Eric, Ferreira, Matthew, Goncalves, Aimee, Gould, Grant, Guoy, Damrong, Gupta, Swati, Han, Xuchen, Hatch, Kyle, Hathaway, Brendan, Henry, Allison, Hochsztein, Hillel, Horgan, Phoebe, Iwase, Shun, Jackson, Donovon, Karamcheti, Siddharth, Keh, Sedrick, Masterjohn, Joseph, Mercat, Jean, Miller, Patrick, Mitiguy, Paul, Nguyen, Tony, Nimmer, Jeremy, Noguchi, Yuki, Ong, Reko, Onol, Aykut, Pfannenstiehl, Owen, Poyner, Richard, Rocha, Leticia Priebe Mendes, Richardson, Gordon, Rodriguez, Christopher, Seale, Derick, Sherman, Michael, Smith-Jones, Mariah, Tago, David, Tokmakov, Pavel, Tran, Matthew, Van Hoorick, Basile, Vasiljevic, Igor, Zakharov, Sergey, Zolotas, Mark, Ambrus, Rares, Fetzer-Borelli, Kerri, Burchfiel, Benjamin, Kress-Gazit, Hadas, Feng, Siyuan, Ford, Stacie, Tedrake, Russ
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.05331
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Robot manipulation has seen tremendous progress in recent years, with imitation learning policies enabling successful performance of dexterous and hard-to-model tasks. Concurrently, scaling data and model size has led to the development of capable language and vision foundation models, motivating large-scale efforts to create general-purpose robot foundation models. While these models have garnered significant enthusiasm and investment, meaningful evaluation of real-world performance remains a challenge, limiting both the pace of development and inhibiting a nuanced understanding of current capabilities. In this paper, we rigorously evaluate multitask robot manipulation policies, referred to as Large Behavior Models (LBMs), by extending the Diffusion Policy paradigm across a corpus of simulated and real-world robot data. We propose and validate an evaluation pipeline to rigorously analyze the capabilities of these models with statistical confidence. We compare against single-task baselines through blind, randomized trials in a controlled setting, using both simulation and real-world experiments. We find that multi-task pretraining makes the policies more successful and robust, and enables teaching complex new tasks more quickly, using a fraction of the data when compared to single-task baselines. Moreover, performance predictably increases as pretraining scale and diversity grows. Project page: https://toyotaresearchinstitute.github.io/lbm1/