Saved in:
Bibliographic Details
Main Authors: Liu, Zeyi, Liu, Shuang, Min, Jihai, Zhang, Zhaoheng, Cen, Jun, Han, Pengyu, Hu, Songqiao, Meng, Zihan, He, Xiao, Zhou, Donghua
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.21173
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914289333305344
author Liu, Zeyi
Liu, Shuang
Min, Jihai
Zhang, Zhaoheng
Cen, Jun
Han, Pengyu
Hu, Songqiao
Meng, Zihan
He, Xiao
Zhou, Donghua
author_facet Liu, Zeyi
Liu, Shuang
Min, Jihai
Zhang, Zhaoheng
Cen, Jun
Han, Pengyu
Hu, Songqiao
Meng, Zihan
He, Xiao
Zhou, Donghua
contents With the rapid development of industrial intelligence and unmanned inspection, reliable perception and safety assessment for AI systems in complex and dynamic industrial sites has become a key bottleneck for deploying predictive maintenance and autonomous inspection. Most public datasets remain limited by simulated data sources, single-modality sensing, or the absence of fine-grained object-level annotations, which prevents robust scene understanding and multimodal safety reasoning for industrial foundation models. To address these limitations, InspecSafe-V1 is released as the first multimodal benchmark dataset for industrial inspection safety assessment that is collected from routine operations of real inspection robots in real-world environments. InspecSafe-V1 covers five representative industrial scenarios, including tunnels, power facilities, sintering equipment, oil and gas petrochemical plants, and coal conveyor trestles. The dataset is constructed from 41 wheeled and rail-mounted inspection robots operating at 2,239 valid inspection sites, yielding 5,013 inspection instances. For each instance, pixel-level segmentation annotations are provided for key objects in visible-spectrum images. In addition, a semantic scene description and a corresponding safety level label are provided according to practical inspection tasks. Seven synchronized sensing modalities are further included, including infrared video, audio, depth point clouds, radar point clouds, gas measurements, temperature, and humidity, to support multimodal anomaly recognition, cross-modal fusion, and comprehensive safety assessment in industrial environments.
format Preprint
id arxiv_https___arxiv_org_abs_2601_21173
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle InspecSafe-V1: A Multimodal Benchmark for Safety Assessment in Industrial Inspection Scenarios
Liu, Zeyi
Liu, Shuang
Min, Jihai
Zhang, Zhaoheng
Cen, Jun
Han, Pengyu
Hu, Songqiao
Meng, Zihan
He, Xiao
Zhou, Donghua
Robotics
Computer Vision and Pattern Recognition
With the rapid development of industrial intelligence and unmanned inspection, reliable perception and safety assessment for AI systems in complex and dynamic industrial sites has become a key bottleneck for deploying predictive maintenance and autonomous inspection. Most public datasets remain limited by simulated data sources, single-modality sensing, or the absence of fine-grained object-level annotations, which prevents robust scene understanding and multimodal safety reasoning for industrial foundation models. To address these limitations, InspecSafe-V1 is released as the first multimodal benchmark dataset for industrial inspection safety assessment that is collected from routine operations of real inspection robots in real-world environments. InspecSafe-V1 covers five representative industrial scenarios, including tunnels, power facilities, sintering equipment, oil and gas petrochemical plants, and coal conveyor trestles. The dataset is constructed from 41 wheeled and rail-mounted inspection robots operating at 2,239 valid inspection sites, yielding 5,013 inspection instances. For each instance, pixel-level segmentation annotations are provided for key objects in visible-spectrum images. In addition, a semantic scene description and a corresponding safety level label are provided according to practical inspection tasks. Seven synchronized sensing modalities are further included, including infrared video, audio, depth point clouds, radar point clouds, gas measurements, temperature, and humidity, to support multimodal anomaly recognition, cross-modal fusion, and comprehensive safety assessment in industrial environments.
title InspecSafe-V1: A Multimodal Benchmark for Safety Assessment in Industrial Inspection Scenarios
topic Robotics
Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2601.21173