Saved in:
Bibliographic Details
Main Authors: Tan, Taoliang, Ma, Chengwei, Tian, Zhen, Lin, Zhao, Li, Dongdong, Shi, Si
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2601.14261
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908778656432128
author Tan, Taoliang
Ma, Chengwei
Tian, Zhen
Lin, Zhao
Li, Dongdong
Shi, Si
author_facet Tan, Taoliang
Ma, Chengwei
Tian, Zhen
Lin, Zhao
Li, Dongdong
Shi, Si
contents The intelligent review of power grid engineering design drawings is crucial for power system safety. However, current automated systems struggle with ultra-high-resolution drawings due to high computational demands, information loss, and a lack of holistic semantic understanding for design error identification. This paper proposes a novel three-stage framework for intelligent power grid drawing review, driven by pre-trained Multimodal Large Language Models (MLLMs) through advanced prompt engineering. Mimicking the human expert review process, the first stage leverages an MLLM for global semantic understanding to intelligently propose domain-specific semantic regions from a low-resolution overview. The second stage then performs high-resolution, fine-grained recognition within these proposed regions, acquiring detailed information with associated confidence scores. In the final stage, a comprehensive decision-making module integrates these confidence-aware results to accurately diagnose design errors and provide a reliability assessment. Preliminary results on real-world power grid drawings demonstrate our approach significantly enhances MLLM's ability to grasp macroscopic semantic information and pinpoint design errors, showing improved defect discovery accuracy and greater reliability in review judgments compared to traditional passive MLLM inference. This research offers a novel, prompt-driven paradigm for intelligent and reliable power grid drawing review.
format Preprint
id arxiv_https___arxiv_org_abs_2601_14261
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Intelligent Power Grid Design Review via Active Perception-Enabled Multimodal Large Language Models
Tan, Taoliang
Ma, Chengwei
Tian, Zhen
Lin, Zhao
Li, Dongdong
Shi, Si
Computer Vision and Pattern Recognition
Human-Computer Interaction
Machine Learning
The intelligent review of power grid engineering design drawings is crucial for power system safety. However, current automated systems struggle with ultra-high-resolution drawings due to high computational demands, information loss, and a lack of holistic semantic understanding for design error identification. This paper proposes a novel three-stage framework for intelligent power grid drawing review, driven by pre-trained Multimodal Large Language Models (MLLMs) through advanced prompt engineering. Mimicking the human expert review process, the first stage leverages an MLLM for global semantic understanding to intelligently propose domain-specific semantic regions from a low-resolution overview. The second stage then performs high-resolution, fine-grained recognition within these proposed regions, acquiring detailed information with associated confidence scores. In the final stage, a comprehensive decision-making module integrates these confidence-aware results to accurately diagnose design errors and provide a reliability assessment. Preliminary results on real-world power grid drawings demonstrate our approach significantly enhances MLLM's ability to grasp macroscopic semantic information and pinpoint design errors, showing improved defect discovery accuracy and greater reliability in review judgments compared to traditional passive MLLM inference. This research offers a novel, prompt-driven paradigm for intelligent and reliable power grid drawing review.
title Intelligent Power Grid Design Review via Active Perception-Enabled Multimodal Large Language Models
topic Computer Vision and Pattern Recognition
Human-Computer Interaction
Machine Learning
url https://arxiv.org/abs/2601.14261