Saved in:
Bibliographic Details
Main Authors: Zhang, Shengdong, Zhang, Xiaoqin, Ren, Wenqi, Shen, Linlin, Wan, Shaohua, Zhang, Jun, Jiang, Yujing M
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2501.15099
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916584096792576
author Zhang, Shengdong
Zhang, Xiaoqin
Ren, Wenqi
Shen, Linlin
Wan, Shaohua
Zhang, Jun
Jiang, Yujing M
author_facet Zhang, Shengdong
Zhang, Xiaoqin
Ren, Wenqi
Shen, Linlin
Wan, Shaohua
Zhang, Jun
Jiang, Yujing M
contents Ensuring a stable power supply in rural areas relies heavily on effective inspection of power equipment, particularly transmission lines (TLs). However, detecting TLs from aerial imagery can be challenging when dealing with misalignments between visible light (RGB) and infrared (IR) images, as well as mismatched high- and low-level features in convolutional networks. To address these limitations, we propose a novel Hierarchical Multi-Modal Enhancement Network (HMMEN) that integrates RGB and IR data for robust and accurate TL detection. Our method introduces two key components: (1) a Mutual Multi-Modal Enhanced Block (MMEB), which fuses and enhances hierarchical RGB and IR feature maps in a coarse-to-fine manner, and (2) a Feature Alignment Block (FAB) that corrects misalignments between decoder outputs and IR feature maps by leveraging deformable convolutions. We employ MobileNet-based encoders for both RGB and IR inputs to accommodate edge-computing constraints and reduce computational overhead. Experimental results on diverse weather and lighting conditionsfog, night, snow, and daytimedemonstrate the superiority and robustness of our approach compared to state-of-the-art methods, resulting in fewer false positives, enhanced boundary delineation, and better overall detection performance. This framework thus shows promise for practical large-scale power line inspections with unmanned aerial vehicles.
format Preprint
id arxiv_https___arxiv_org_abs_2501_15099
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Bringing RGB and IR Together: Hierarchical Multi-Modal Enhancement for Robust Transmission Line Detection
Zhang, Shengdong
Zhang, Xiaoqin
Ren, Wenqi
Shen, Linlin
Wan, Shaohua
Zhang, Jun
Jiang, Yujing M
Computer Vision and Pattern Recognition
Machine Learning
Ensuring a stable power supply in rural areas relies heavily on effective inspection of power equipment, particularly transmission lines (TLs). However, detecting TLs from aerial imagery can be challenging when dealing with misalignments between visible light (RGB) and infrared (IR) images, as well as mismatched high- and low-level features in convolutional networks. To address these limitations, we propose a novel Hierarchical Multi-Modal Enhancement Network (HMMEN) that integrates RGB and IR data for robust and accurate TL detection. Our method introduces two key components: (1) a Mutual Multi-Modal Enhanced Block (MMEB), which fuses and enhances hierarchical RGB and IR feature maps in a coarse-to-fine manner, and (2) a Feature Alignment Block (FAB) that corrects misalignments between decoder outputs and IR feature maps by leveraging deformable convolutions. We employ MobileNet-based encoders for both RGB and IR inputs to accommodate edge-computing constraints and reduce computational overhead. Experimental results on diverse weather and lighting conditionsfog, night, snow, and daytimedemonstrate the superiority and robustness of our approach compared to state-of-the-art methods, resulting in fewer false positives, enhanced boundary delineation, and better overall detection performance. This framework thus shows promise for practical large-scale power line inspections with unmanned aerial vehicles.
title Bringing RGB and IR Together: Hierarchical Multi-Modal Enhancement for Robust Transmission Line Detection
topic Computer Vision and Pattern Recognition
Machine Learning
url https://arxiv.org/abs/2501.15099