Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wen, Chenyiming, Shi, Binpu, Li, Min, Zhao, Ming-Min, Zhao, Min-Jian, Wang, Jiangzhou
Format:	Preprint
Published:	2025
Subjects:	Information Theory
Online Access:	https://arxiv.org/abs/2512.11331
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914422218293248
author	Wen, Chenyiming Shi, Binpu Li, Min Zhao, Ming-Min Zhao, Min-Jian Wang, Jiangzhou
author_facet	Wen, Chenyiming Shi, Binpu Li, Min Zhao, Ming-Min Zhao, Min-Jian Wang, Jiangzhou
contents	With the widespread adoption of millimeter-wave (mmWave) massive multi-input-multi-output (MIMO) in vehicular networks, accurate beam prediction and alignment have become critical for high-speed data transmission and reliable access. While traditional beam prediction approaches primarily rely on in-band beam training, recent advances have started to explore multimodal sensing to extract environmental semantics for enhanced prediction. However, the performance of existing multimodal fusion methods degrades significantly in real-world settings because they are vulnerable to missing data caused by sensor blockage, poor lighting, or GPS dropouts. To address this challenge, we propose AMBER ({A}daptive multimodal {M}ask transformer for {BE}am p{R}ediction), a novel end-to-end framework that processes temporal sequences of image, LiDAR, radar, and GPS data, while adaptively handling arbitrary missing-modality cases. AMBER introduces learnable modality tokens and a missing-modality-aware mask to prevent cross-modal noise propagation, along with a learnable fusion token and multihead attention to achieve robust modality-specific information distillation and feature-level fusion. Furthermore, a class-former-aided modality alignment (CMA) module and temporal-aware positional embedding are incorporated to preserve temporal coherence and ensure semantic alignment across modalities, facilitating the learning of modality-invariant and temporally consistent representations for beam prediction. Extensive experiments on the real-world DeepSense6G dataset demonstrate that AMBER significantly outperforms existing multimodal learning baselines. In particular, it maintains high beam prediction accuracy and robustness even under severe missing-modality scenarios, validating its effectiveness and practical applicability.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_11331
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	AMBER: An Adaptive Multimodal Mask Transformer for Beam Prediction with Missing Modalities Wen, Chenyiming Shi, Binpu Li, Min Zhao, Ming-Min Zhao, Min-Jian Wang, Jiangzhou Information Theory With the widespread adoption of millimeter-wave (mmWave) massive multi-input-multi-output (MIMO) in vehicular networks, accurate beam prediction and alignment have become critical for high-speed data transmission and reliable access. While traditional beam prediction approaches primarily rely on in-band beam training, recent advances have started to explore multimodal sensing to extract environmental semantics for enhanced prediction. However, the performance of existing multimodal fusion methods degrades significantly in real-world settings because they are vulnerable to missing data caused by sensor blockage, poor lighting, or GPS dropouts. To address this challenge, we propose AMBER ({A}daptive multimodal {M}ask transformer for {BE}am p{R}ediction), a novel end-to-end framework that processes temporal sequences of image, LiDAR, radar, and GPS data, while adaptively handling arbitrary missing-modality cases. AMBER introduces learnable modality tokens and a missing-modality-aware mask to prevent cross-modal noise propagation, along with a learnable fusion token and multihead attention to achieve robust modality-specific information distillation and feature-level fusion. Furthermore, a class-former-aided modality alignment (CMA) module and temporal-aware positional embedding are incorporated to preserve temporal coherence and ensure semantic alignment across modalities, facilitating the learning of modality-invariant and temporally consistent representations for beam prediction. Extensive experiments on the real-world DeepSense6G dataset demonstrate that AMBER significantly outperforms existing multimodal learning baselines. In particular, it maintains high beam prediction accuracy and robustness even under severe missing-modality scenarios, validating its effectiveness and practical applicability.
title	AMBER: An Adaptive Multimodal Mask Transformer for Beam Prediction with Missing Modalities
topic	Information Theory
url	https://arxiv.org/abs/2512.11331

Similar Items