Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ma, Zhiming, Xiao, Xiayang, Dong, Sihao, Wang, Peidong, Wang, HaiPeng, Pan, Qingyun
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2502.08168
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929740229640192
author	Ma, Zhiming Xiao, Xiayang Dong, Sihao Wang, Peidong Wang, HaiPeng Pan, Qingyun
author_facet	Ma, Zhiming Xiao, Xiayang Dong, Sihao Wang, Peidong Wang, HaiPeng Pan, Qingyun
contents	As a powerful all-weather Earth observation tool, synthetic aperture radar (SAR) remote sensing enables critical military reconnaissance, maritime surveillance, and infrastructure monitoring. Although Vision language models (VLMs) have made remarkable progress in natural language processing and image understanding, their applications remain limited in professional domains due to insufficient domain expertise. This paper innovatively proposes the first large-scale multimodal dialogue dataset for SAR images, named SARChat-2M, which contains approximately 2 million high-quality image-text pairs, encompasses diverse scenarios with detailed target annotations. This dataset not only supports several key tasks such as visual understanding and object detection tasks, but also has unique innovative aspects: this study develop a visual-language dataset and benchmark for the SAR domain, enabling and evaluating VLMs' capabilities in SAR image interpretation, which provides a paradigmatic framework for constructing multimodal datasets across various remote sensing vertical domains. Through experiments on 16 mainstream VLMs, the effectiveness of the dataset has been fully verified. The project will be released at https://github.com/JimmyMa99/SARChat.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_08168
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation Ma, Zhiming Xiao, Xiayang Dong, Sihao Wang, Peidong Wang, HaiPeng Pan, Qingyun Computation and Language As a powerful all-weather Earth observation tool, synthetic aperture radar (SAR) remote sensing enables critical military reconnaissance, maritime surveillance, and infrastructure monitoring. Although Vision language models (VLMs) have made remarkable progress in natural language processing and image understanding, their applications remain limited in professional domains due to insufficient domain expertise. This paper innovatively proposes the first large-scale multimodal dialogue dataset for SAR images, named SARChat-2M, which contains approximately 2 million high-quality image-text pairs, encompasses diverse scenarios with detailed target annotations. This dataset not only supports several key tasks such as visual understanding and object detection tasks, but also has unique innovative aspects: this study develop a visual-language dataset and benchmark for the SAR domain, enabling and evaluating VLMs' capabilities in SAR image interpretation, which provides a paradigmatic framework for constructing multimodal datasets across various remote sensing vertical domains. Through experiments on 16 mainstream VLMs, the effectiveness of the dataset has been fully verified. The project will be released at https://github.com/JimmyMa99/SARChat.
title	SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation
topic	Computation and Language
url	https://arxiv.org/abs/2502.08168

Similar Items