Saved in:
Bibliographic Details
Main Authors: Liu, Anbang, Hu, Guanzhong, Wang, Jiayi, Guo, Ping, Liu, Han
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2512.02018
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918226130108416
author Liu, Anbang
Hu, Guanzhong
Wang, Jiayi
Guo, Ping
Liu, Han
author_facet Liu, Anbang
Hu, Guanzhong
Wang, Jiayi
Guo, Ping
Liu, Han
contents Self-driving laboratories offer a promising path toward reducing the labor-intensive, time-consuming, and often irreproducible workflows in the biological sciences. Yet their stringent precision requirements demand highly robust models whose training relies on large amounts of annotated data. However, this kind of data is difficult to obtain in routine practice, especially negative samples. In this work, we focus on pipetting, the most critical and precision sensitive action in SDLs. To overcome the scarcity of training data, we build a hybrid pipeline that fuses real and virtual data generation. The real track adopts a human-in-the-loop scheme that couples automated acquisition with selective human verification to maximize accuracy with minimal effort. The virtual track augments the real data using reference-conditioned, prompt-guided image generation, which is further screened and validated for reliability. Together, these two tracks yield a class-balanced dataset that enables robust bubble detection training. On a held-out real test set, a model trained entirely on automatically acquired real images reaches 99.6% accuracy, and mixing real and generated data during training sustains 99.4% accuracy while reducing collection and review load. Our approach offers a scalable and cost-effective strategy for supplying visual feedback data to SDL workflows and provides a practical solution to data scarcity in rare event detection and broader vision tasks.
format Preprint
id arxiv_https___arxiv_org_abs_2512_02018
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Data-Centric Visual Development for Self-Driving Labs
Liu, Anbang
Hu, Guanzhong
Wang, Jiayi
Guo, Ping
Liu, Han
Computer Vision and Pattern Recognition
Robotics
Self-driving laboratories offer a promising path toward reducing the labor-intensive, time-consuming, and often irreproducible workflows in the biological sciences. Yet their stringent precision requirements demand highly robust models whose training relies on large amounts of annotated data. However, this kind of data is difficult to obtain in routine practice, especially negative samples. In this work, we focus on pipetting, the most critical and precision sensitive action in SDLs. To overcome the scarcity of training data, we build a hybrid pipeline that fuses real and virtual data generation. The real track adopts a human-in-the-loop scheme that couples automated acquisition with selective human verification to maximize accuracy with minimal effort. The virtual track augments the real data using reference-conditioned, prompt-guided image generation, which is further screened and validated for reliability. Together, these two tracks yield a class-balanced dataset that enables robust bubble detection training. On a held-out real test set, a model trained entirely on automatically acquired real images reaches 99.6% accuracy, and mixing real and generated data during training sustains 99.4% accuracy while reducing collection and review load. Our approach offers a scalable and cost-effective strategy for supplying visual feedback data to SDL workflows and provides a practical solution to data scarcity in rare event detection and broader vision tasks.
title Data-Centric Visual Development for Self-Driving Labs
topic Computer Vision and Pattern Recognition
Robotics
url https://arxiv.org/abs/2512.02018