Saved in:
Bibliographic Details
Main Author: Ghosh, Sanjukta
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.03376
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Industrial diagrams such as piping and instrumentation diagrams (P&IDs) are essential for the design, operation, and maintenance of industrial plants. Converting these diagrams into digital form is an important step toward building digital twins and enabling intelligent industrial automation. A central challenge in this digitalization process is accurate object detection. Although recent advances have significantly improved object detection algorithms, there remains a lack of methods to automatically evaluate the quality of their outputs. This paper addresses this gap by introducing a framework that employs Visual Language Models (VLMs) to assess object detection results and guide their refinement. The approach exploits the multimodal capabilities of VLMs to identify missing or inconsistent detections, thereby enabling automated quality assessment and improving overall detection performance on complex industrial diagrams.