Saved in:
Bibliographic Details
Main Authors: Xu, Boyan, Wen, Liang, Li, Zihao, Yang, Yuxing, Wu, Guanlan, Tang, Xiongpeng, Li, Yu, Wu, Zihao, Su, Qingxian, Shi, Xueqing, Yang, Yue, Tong, Rui, Ng, How Yong
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.21045
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911973340348416
author Xu, Boyan
Wen, Liang
Li, Zihao
Yang, Yuxing
Wu, Guanlan
Tang, Xiongpeng
Li, Yu
Wu, Zihao
Su, Qingxian
Shi, Xueqing
Yang, Yue
Tong, Rui
Ng, How Yong
author_facet Xu, Boyan
Wen, Liang
Li, Zihao
Yang, Yuxing
Wu, Guanlan
Tang, Xiongpeng
Li, Yu
Wu, Zihao
Su, Qingxian
Shi, Xueqing
Yang, Yue
Tong, Rui
Ng, How Yong
contents Recent advancements in Large Language Models (LLMs) have sparked interest in their potential applications across various fields. This paper embarked on a pivotal inquiry: Can existing LLMs effectively serve as "water expert models" for water engineering and research tasks? This study was the first to evaluate LLMs' contributions across various water engineering and research tasks by establishing a domain-specific benchmark suite, namely, WaterER. Herein, we prepared 983 tasks related to water engineering and research, categorized into "wastewater treatment", "environmental restoration", "drinking water treatment and distribution", "sanitation", "anaerobic digestion" and "contaminants assessment". We evaluated the performance of seven LLMs (i.e., GPT-4, GPT-3.5, Gemini, GLM-4, ERNIE, QWEN and Llama3) on these tasks. We highlighted the strengths of GPT-4 in handling diverse and complex tasks of water engineering and water research, the specialized capabilities of Gemini in academic contexts, Llama3's strongest capacity to answer Chinese water engineering questions and the competitive performance of Chinese-oriented models like GLM-4, ERNIE and QWEN in some water engineering tasks. More specifically, current LLMs excelled particularly in generating precise research gaps for papers on "contaminants and related water quality monitoring and assessment". Additionally, they were more adept at creating appropriate titles for research papers on "treatment processes for wastewaters", "environmental restoration", and "drinking water treatment". Overall, this study pioneered evaluating LLMs in water engineering and research by introducing the WaterER benchmark to assess the trustworthiness of their predictions. This standardized evaluation framework would also drive future advancements in LLM technology by using targeting datasets, propelling these models towards becoming true "water expert".
format Preprint
id arxiv_https___arxiv_org_abs_2407_21045
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Unlocking the Potential: Benchmarking Large Language Models in Water Engineering and Research
Xu, Boyan
Wen, Liang
Li, Zihao
Yang, Yuxing
Wu, Guanlan
Tang, Xiongpeng
Li, Yu
Wu, Zihao
Su, Qingxian
Shi, Xueqing
Yang, Yue
Tong, Rui
Ng, How Yong
Computation and Language
Artificial Intelligence
Recent advancements in Large Language Models (LLMs) have sparked interest in their potential applications across various fields. This paper embarked on a pivotal inquiry: Can existing LLMs effectively serve as "water expert models" for water engineering and research tasks? This study was the first to evaluate LLMs' contributions across various water engineering and research tasks by establishing a domain-specific benchmark suite, namely, WaterER. Herein, we prepared 983 tasks related to water engineering and research, categorized into "wastewater treatment", "environmental restoration", "drinking water treatment and distribution", "sanitation", "anaerobic digestion" and "contaminants assessment". We evaluated the performance of seven LLMs (i.e., GPT-4, GPT-3.5, Gemini, GLM-4, ERNIE, QWEN and Llama3) on these tasks. We highlighted the strengths of GPT-4 in handling diverse and complex tasks of water engineering and water research, the specialized capabilities of Gemini in academic contexts, Llama3's strongest capacity to answer Chinese water engineering questions and the competitive performance of Chinese-oriented models like GLM-4, ERNIE and QWEN in some water engineering tasks. More specifically, current LLMs excelled particularly in generating precise research gaps for papers on "contaminants and related water quality monitoring and assessment". Additionally, they were more adept at creating appropriate titles for research papers on "treatment processes for wastewaters", "environmental restoration", and "drinking water treatment". Overall, this study pioneered evaluating LLMs in water engineering and research by introducing the WaterER benchmark to assess the trustworthiness of their predictions. This standardized evaluation framework would also drive future advancements in LLM technology by using targeting datasets, propelling these models towards becoming true "water expert".
title Unlocking the Potential: Benchmarking Large Language Models in Water Engineering and Research
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2407.21045