Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yu, Yongan, Hu, Qingchen, Du, Xianda, Wang, Jiayin, Mo, Fengran, Sieber, Renee
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2505.20249
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909874386894848
author	Yu, Yongan Hu, Qingchen Du, Xianda Wang, Jiayin Mo, Fengran Sieber, Renee
author_facet	Yu, Yongan Hu, Qingchen Du, Xianda Wang, Jiayin Mo, Fengran Sieber, Renee
contents	Climate change adaptation requires the understanding of disruptive weather impacts on society, where large language models (LLMs) might be applicable. However, their effectiveness is under-explored due to the difficulty of high-quality corpus collection and the lack of available benchmarks. The climate-related events stored in regional newspapers record how communities adapted and recovered from disasters. However, the processing of the original corpus is non-trivial. In this study, we first develop a disruptive weather impact dataset with a four-stage well-crafted construction pipeline. Then, we propose WXImpactBench, the first benchmark for evaluating the capacity of LLMs on disruptive weather impacts. The benchmark involves two evaluation tasks, multi-label classification and ranking-based question answering. Extensive experiments on evaluating a set of LLMs provide first-hand analysis of the challenges in developing disruptive weather impact understanding and climate change adaptation systems. The constructed dataset and the code for the evaluation framework are available to help society protect against vulnerabilities from disasters.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_20249
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models Yu, Yongan Hu, Qingchen Du, Xianda Wang, Jiayin Mo, Fengran Sieber, Renee Computation and Language Artificial Intelligence Climate change adaptation requires the understanding of disruptive weather impacts on society, where large language models (LLMs) might be applicable. However, their effectiveness is under-explored due to the difficulty of high-quality corpus collection and the lack of available benchmarks. The climate-related events stored in regional newspapers record how communities adapted and recovered from disasters. However, the processing of the original corpus is non-trivial. In this study, we first develop a disruptive weather impact dataset with a four-stage well-crafted construction pipeline. Then, we propose WXImpactBench, the first benchmark for evaluating the capacity of LLMs on disruptive weather impacts. The benchmark involves two evaluation tasks, multi-label classification and ranking-based question answering. Extensive experiments on evaluating a set of LLMs provide first-hand analysis of the challenges in developing disruptive weather impact understanding and climate change adaptation systems. The constructed dataset and the code for the evaluation framework are available to help society protect against vulnerabilities from disasters.
title	WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2505.20249

Similar Items