Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xu, Boyan, Wen, Liang, Li, Zihao, Yang, Yuxing, Wu, Guanlan, Tang, Xiongpeng, Li, Yu, Wu, Zihao, Su, Qingxian, Shi, Xueqing, Yang, Yue, Tong, Rui, Ng, How Yong
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2407.21045
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911973340348416
author	Xu, Boyan Wen, Liang Li, Zihao Yang, Yuxing Wu, Guanlan Tang, Xiongpeng Li, Yu Wu, Zihao Su, Qingxian Shi, Xueqing Yang, Yue Tong, Rui Ng, How Yong
author_facet	Xu, Boyan Wen, Liang Li, Zihao Yang, Yuxing Wu, Guanlan Tang, Xiongpeng Li, Yu Wu, Zihao Su, Qingxian Shi, Xueqing Yang, Yue Tong, Rui Ng, How Yong
contents	Recent advancements in Large Language Models (LLMs) have sparked interest in their potential applications across various fields. This paper embarked on a pivotal inquiry: Can existing LLMs effectively serve as "water expert models" for water engineering and research tasks? This study was the first to evaluate LLMs' contributions across various water engineering and research tasks by establishing a domain-specific benchmark suite, namely, WaterER. Herein, we prepared 983 tasks related to water engineering and research, categorized into "wastewater treatment", "environmental restoration", "drinking water treatment and distribution", "sanitation", "anaerobic digestion" and "contaminants assessment". We evaluated the performance of seven LLMs (i.e., GPT-4, GPT-3.5, Gemini, GLM-4, ERNIE, QWEN and Llama3) on these tasks. We highlighted the strengths of GPT-4 in handling diverse and complex tasks of water engineering and water research, the specialized capabilities of Gemini in academic contexts, Llama3's strongest capacity to answer Chinese water engineering questions and the competitive performance of Chinese-oriented models like GLM-4, ERNIE and QWEN in some water engineering tasks. More specifically, current LLMs excelled particularly in generating precise research gaps for papers on "contaminants and related water quality monitoring and assessment". Additionally, they were more adept at creating appropriate titles for research papers on "treatment processes for wastewaters", "environmental restoration", and "drinking water treatment". Overall, this study pioneered evaluating LLMs in water engineering and research by introducing the WaterER benchmark to assess the trustworthiness of their predictions. This standardized evaluation framework would also drive future advancements in LLM technology by using targeting datasets, propelling these models towards becoming true "water expert".
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_21045
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Unlocking the Potential: Benchmarking Large Language Models in Water Engineering and Research Xu, Boyan Wen, Liang Li, Zihao Yang, Yuxing Wu, Guanlan Tang, Xiongpeng Li, Yu Wu, Zihao Su, Qingxian Shi, Xueqing Yang, Yue Tong, Rui Ng, How Yong Computation and Language Artificial Intelligence Recent advancements in Large Language Models (LLMs) have sparked interest in their potential applications across various fields. This paper embarked on a pivotal inquiry: Can existing LLMs effectively serve as "water expert models" for water engineering and research tasks? This study was the first to evaluate LLMs' contributions across various water engineering and research tasks by establishing a domain-specific benchmark suite, namely, WaterER. Herein, we prepared 983 tasks related to water engineering and research, categorized into "wastewater treatment", "environmental restoration", "drinking water treatment and distribution", "sanitation", "anaerobic digestion" and "contaminants assessment". We evaluated the performance of seven LLMs (i.e., GPT-4, GPT-3.5, Gemini, GLM-4, ERNIE, QWEN and Llama3) on these tasks. We highlighted the strengths of GPT-4 in handling diverse and complex tasks of water engineering and water research, the specialized capabilities of Gemini in academic contexts, Llama3's strongest capacity to answer Chinese water engineering questions and the competitive performance of Chinese-oriented models like GLM-4, ERNIE and QWEN in some water engineering tasks. More specifically, current LLMs excelled particularly in generating precise research gaps for papers on "contaminants and related water quality monitoring and assessment". Additionally, they were more adept at creating appropriate titles for research papers on "treatment processes for wastewaters", "environmental restoration", and "drinking water treatment". Overall, this study pioneered evaluating LLMs in water engineering and research by introducing the WaterER benchmark to assess the trustworthiness of their predictions. This standardized evaluation framework would also drive future advancements in LLM technology by using targeting datasets, propelling these models towards becoming true "water expert".
title	Unlocking the Potential: Benchmarking Large Language Models in Water Engineering and Research
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2407.21045

Similar Items