Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Han, Zehua, Xiao, Jing, Duan, Yiqi, Xiang, Mengyu, Ji, Yuheng, Zheng, Xiaolong, Zhang, Chenghanyu, She, Zhendong, Shen, Junyu, Tan, Dingwei, Sun, Shichu, Cong, Zhou, Liu, Mingxuan, Wang, Fengxiang, Sun, Jinping, Sun, Yangang
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.28183
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908927546884096
author	Han, Zehua Xiao, Jing Duan, Yiqi Xiang, Mengyu Ji, Yuheng Zheng, Xiaolong Zhang, Chenghanyu She, Zhendong Shen, Junyu Tan, Dingwei Sun, Shichu Cong, Zhou Liu, Mingxuan Wang, Fengxiang Sun, Jinping Sun, Yangang
author_facet	Han, Zehua Xiao, Jing Duan, Yiqi Xiang, Mengyu Ji, Yuheng Zheng, Xiaolong Zhang, Chenghanyu She, Zhendong Shen, Junyu Tan, Dingwei Sun, Shichu Cong, Zhou Liu, Mingxuan Wang, Fengxiang Sun, Jinping Sun, Yangang
contents	Multimodal Large Language Models have demonstrated powerful cross-modal understanding and reasoning capabilities in general domains. However, in the electromagnetic (EM) domain, they still face challenges such as data scarcity and insufficient integration of domain knowledge. This paper proposes PReD, the first foundation model for the EM domain that covers the intelligent closed-loop of "perception, recognition, decision-making." We constructed a high-quality multitask EM dataset, PReD-1.3M, and an evaluation benchmark, PReD-Bench. The dataset encompasses multi-perspective representations such as raw time-domain waveform, frequency-domain spectrograms, and constellation diagrams, covering typical features of communication and radar signals. It supports a range of core tasks, including signal detection, modulation recognition, parameter estimation, protocol recognition, radio frequency fingerprint recognition, and anti-jamming decision-making. PReD adopts a multi-stage training strategy that unifies multiple tasks for EM signals. It achieves closed-loop optimization from end-to-end signal understanding to language-driven reasoning and decision-making, significantly enhancing EM domain expertise while maintaining general multimodal capabilities. Experimental results show that PReD achieves state-of-the-art performance on PReD-Bench constructed from both open-source and self-collected signal datasets. These results collectively validate the feasibility and potential of vision-aligned foundation models in advancing the understanding and reasoning of EM signals.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_28183
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	PReD: An LLM-based Foundation Multimodal Model for Electromagnetic Perception, Recognition, and Decision Han, Zehua Xiao, Jing Duan, Yiqi Xiang, Mengyu Ji, Yuheng Zheng, Xiaolong Zhang, Chenghanyu She, Zhendong Shen, Junyu Tan, Dingwei Sun, Shichu Cong, Zhou Liu, Mingxuan Wang, Fengxiang Sun, Jinping Sun, Yangang Artificial Intelligence Multimodal Large Language Models have demonstrated powerful cross-modal understanding and reasoning capabilities in general domains. However, in the electromagnetic (EM) domain, they still face challenges such as data scarcity and insufficient integration of domain knowledge. This paper proposes PReD, the first foundation model for the EM domain that covers the intelligent closed-loop of "perception, recognition, decision-making." We constructed a high-quality multitask EM dataset, PReD-1.3M, and an evaluation benchmark, PReD-Bench. The dataset encompasses multi-perspective representations such as raw time-domain waveform, frequency-domain spectrograms, and constellation diagrams, covering typical features of communication and radar signals. It supports a range of core tasks, including signal detection, modulation recognition, parameter estimation, protocol recognition, radio frequency fingerprint recognition, and anti-jamming decision-making. PReD adopts a multi-stage training strategy that unifies multiple tasks for EM signals. It achieves closed-loop optimization from end-to-end signal understanding to language-driven reasoning and decision-making, significantly enhancing EM domain expertise while maintaining general multimodal capabilities. Experimental results show that PReD achieves state-of-the-art performance on PReD-Bench constructed from both open-source and self-collected signal datasets. These results collectively validate the feasibility and potential of vision-aligned foundation models in advancing the understanding and reasoning of EM signals.
title	PReD: An LLM-based Foundation Multimodal Model for Electromagnetic Perception, Recognition, and Decision
topic	Artificial Intelligence
url	https://arxiv.org/abs/2603.28183

Similar Items