Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Tan, Taoliang, Ma, Chengwei, Tian, Zhen, Lin, Zhao, Li, Dongdong, Shi, Si
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Human-Computer Interaction Machine Learning
Online Access:	https://arxiv.org/abs/2601.14261
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908778656432128
author	Tan, Taoliang Ma, Chengwei Tian, Zhen Lin, Zhao Li, Dongdong Shi, Si
author_facet	Tan, Taoliang Ma, Chengwei Tian, Zhen Lin, Zhao Li, Dongdong Shi, Si
contents	The intelligent review of power grid engineering design drawings is crucial for power system safety. However, current automated systems struggle with ultra-high-resolution drawings due to high computational demands, information loss, and a lack of holistic semantic understanding for design error identification. This paper proposes a novel three-stage framework for intelligent power grid drawing review, driven by pre-trained Multimodal Large Language Models (MLLMs) through advanced prompt engineering. Mimicking the human expert review process, the first stage leverages an MLLM for global semantic understanding to intelligently propose domain-specific semantic regions from a low-resolution overview. The second stage then performs high-resolution, fine-grained recognition within these proposed regions, acquiring detailed information with associated confidence scores. In the final stage, a comprehensive decision-making module integrates these confidence-aware results to accurately diagnose design errors and provide a reliability assessment. Preliminary results on real-world power grid drawings demonstrate our approach significantly enhances MLLM's ability to grasp macroscopic semantic information and pinpoint design errors, showing improved defect discovery accuracy and greater reliability in review judgments compared to traditional passive MLLM inference. This research offers a novel, prompt-driven paradigm for intelligent and reliable power grid drawing review.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_14261
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Intelligent Power Grid Design Review via Active Perception-Enabled Multimodal Large Language Models Tan, Taoliang Ma, Chengwei Tian, Zhen Lin, Zhao Li, Dongdong Shi, Si Computer Vision and Pattern Recognition Human-Computer Interaction Machine Learning The intelligent review of power grid engineering design drawings is crucial for power system safety. However, current automated systems struggle with ultra-high-resolution drawings due to high computational demands, information loss, and a lack of holistic semantic understanding for design error identification. This paper proposes a novel three-stage framework for intelligent power grid drawing review, driven by pre-trained Multimodal Large Language Models (MLLMs) through advanced prompt engineering. Mimicking the human expert review process, the first stage leverages an MLLM for global semantic understanding to intelligently propose domain-specific semantic regions from a low-resolution overview. The second stage then performs high-resolution, fine-grained recognition within these proposed regions, acquiring detailed information with associated confidence scores. In the final stage, a comprehensive decision-making module integrates these confidence-aware results to accurately diagnose design errors and provide a reliability assessment. Preliminary results on real-world power grid drawings demonstrate our approach significantly enhances MLLM's ability to grasp macroscopic semantic information and pinpoint design errors, showing improved defect discovery accuracy and greater reliability in review judgments compared to traditional passive MLLM inference. This research offers a novel, prompt-driven paradigm for intelligent and reliable power grid drawing review.
title	Intelligent Power Grid Design Review via Active Perception-Enabled Multimodal Large Language Models
topic	Computer Vision and Pattern Recognition Human-Computer Interaction Machine Learning
url	https://arxiv.org/abs/2601.14261

Similar Items