Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Anbang, Hu, Guanzhong, Wang, Jiayi, Guo, Ping, Liu, Han
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Robotics
Online Access:	https://arxiv.org/abs/2512.02018
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918226130108416
author	Liu, Anbang Hu, Guanzhong Wang, Jiayi Guo, Ping Liu, Han
author_facet	Liu, Anbang Hu, Guanzhong Wang, Jiayi Guo, Ping Liu, Han
contents	Self-driving laboratories offer a promising path toward reducing the labor-intensive, time-consuming, and often irreproducible workflows in the biological sciences. Yet their stringent precision requirements demand highly robust models whose training relies on large amounts of annotated data. However, this kind of data is difficult to obtain in routine practice, especially negative samples. In this work, we focus on pipetting, the most critical and precision sensitive action in SDLs. To overcome the scarcity of training data, we build a hybrid pipeline that fuses real and virtual data generation. The real track adopts a human-in-the-loop scheme that couples automated acquisition with selective human verification to maximize accuracy with minimal effort. The virtual track augments the real data using reference-conditioned, prompt-guided image generation, which is further screened and validated for reliability. Together, these two tracks yield a class-balanced dataset that enables robust bubble detection training. On a held-out real test set, a model trained entirely on automatically acquired real images reaches 99.6% accuracy, and mixing real and generated data during training sustains 99.4% accuracy while reducing collection and review load. Our approach offers a scalable and cost-effective strategy for supplying visual feedback data to SDL workflows and provides a practical solution to data scarcity in rare event detection and broader vision tasks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_02018
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Data-Centric Visual Development for Self-Driving Labs Liu, Anbang Hu, Guanzhong Wang, Jiayi Guo, Ping Liu, Han Computer Vision and Pattern Recognition Robotics Self-driving laboratories offer a promising path toward reducing the labor-intensive, time-consuming, and often irreproducible workflows in the biological sciences. Yet their stringent precision requirements demand highly robust models whose training relies on large amounts of annotated data. However, this kind of data is difficult to obtain in routine practice, especially negative samples. In this work, we focus on pipetting, the most critical and precision sensitive action in SDLs. To overcome the scarcity of training data, we build a hybrid pipeline that fuses real and virtual data generation. The real track adopts a human-in-the-loop scheme that couples automated acquisition with selective human verification to maximize accuracy with minimal effort. The virtual track augments the real data using reference-conditioned, prompt-guided image generation, which is further screened and validated for reliability. Together, these two tracks yield a class-balanced dataset that enables robust bubble detection training. On a held-out real test set, a model trained entirely on automatically acquired real images reaches 99.6% accuracy, and mixing real and generated data during training sustains 99.4% accuracy while reducing collection and review load. Our approach offers a scalable and cost-effective strategy for supplying visual feedback data to SDL workflows and provides a practical solution to data scarcity in rare event detection and broader vision tasks.
title	Data-Centric Visual Development for Self-Driving Labs
topic	Computer Vision and Pattern Recognition Robotics
url	https://arxiv.org/abs/2512.02018

Similar Items