Saved in:
Bibliographic Details
Main Authors: de Brito, Mariana Madruga, Madureira, Brielen, Carvalho, Taís Maria Nunes, Delforge, Damien, Jézéquel, Aglaé, Kurfalı, Murathan, Li, Ni, Messori, Gabriele, Nivre, Joakim, Pernici, Barbara, Speybroeck, Niko, Terzi, Stefano, Thiery, Wim, Valkenborg, Bram, Wang, Jingxian, Zahra, Shorouq, Zscheischler, Jakob, Sodoge, Jan
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.20793
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910239839748096
author de Brito, Mariana Madruga
Madureira, Brielen
Carvalho, Taís Maria Nunes
Delforge, Damien
Jézéquel, Aglaé
Kurfalı, Murathan
Li, Ni
Messori, Gabriele
Nivre, Joakim
Pernici, Barbara
Speybroeck, Niko
Terzi, Stefano
Thiery, Wim
Valkenborg, Bram
Wang, Jingxian
Zahra, Shorouq
Zscheischler, Jakob
Sodoge, Jan
author_facet de Brito, Mariana Madruga
Madureira, Brielen
Carvalho, Taís Maria Nunes
Delforge, Damien
Jézéquel, Aglaé
Kurfalı, Murathan
Li, Ni
Messori, Gabriele
Nivre, Joakim
Pernici, Barbara
Speybroeck, Niko
Terzi, Stefano
Thiery, Wim
Valkenborg, Bram
Wang, Jingxian
Zahra, Shorouq
Zscheischler, Jakob
Sodoge, Jan
contents Recent advances in natural language processing (NLP) and large language models (LLMs) have enabled the systematic use of large-scale textual data from news, social media, and reports to create datasets with socio-economic impacts of climate hazards such as floods, droughts, storms, and multi-hazard events. As the field of text-as-data for impact assessment expands, so does its methodological complexity. Yet research remains fragmented, with no clear guidelines for defining what constitutes an impact, handling temporal and spatial biases, and selecting appropriate modeling and post-processing strategies. This lack of coherence limits transparency and comparability across studies. Here, we address this gap by synthesising common practices, describing key challenges specific to the use of text-as-data methods for analyzing socio-economic impact data, and proposing recommendations to address them. By providing guidance on best practices, we aim to support the construction of robust text-derived socio-economic impact datasets that can more accurately inform disaster risk management and attribution studies.
format Preprint
id arxiv_https___arxiv_org_abs_2605_20793
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Assessing socio-economic climate impacts from text data
de Brito, Mariana Madruga
Madureira, Brielen
Carvalho, Taís Maria Nunes
Delforge, Damien
Jézéquel, Aglaé
Kurfalı, Murathan
Li, Ni
Messori, Gabriele
Nivre, Joakim
Pernici, Barbara
Speybroeck, Niko
Terzi, Stefano
Thiery, Wim
Valkenborg, Bram
Wang, Jingxian
Zahra, Shorouq
Zscheischler, Jakob
Sodoge, Jan
Computation and Language
Recent advances in natural language processing (NLP) and large language models (LLMs) have enabled the systematic use of large-scale textual data from news, social media, and reports to create datasets with socio-economic impacts of climate hazards such as floods, droughts, storms, and multi-hazard events. As the field of text-as-data for impact assessment expands, so does its methodological complexity. Yet research remains fragmented, with no clear guidelines for defining what constitutes an impact, handling temporal and spatial biases, and selecting appropriate modeling and post-processing strategies. This lack of coherence limits transparency and comparability across studies. Here, we address this gap by synthesising common practices, describing key challenges specific to the use of text-as-data methods for analyzing socio-economic impact data, and proposing recommendations to address them. By providing guidance on best practices, we aim to support the construction of robust text-derived socio-economic impact datasets that can more accurately inform disaster risk management and attribution studies.
title Assessing socio-economic climate impacts from text data
topic Computation and Language
url https://arxiv.org/abs/2605.20793