Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Guo, Jing, Li, Nan, Xu, Ming
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Information Retrieval
Online Access:	https://arxiv.org/abs/2501.06277
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916563943161856
author	Guo, Jing Li, Nan Xu, Ming
author_facet	Guo, Jing Li, Nan Xu, Ming
contents	Generative AI holds significant potential for ecological and environmental applications such as monitoring, data analysis, education, and policy support. However, its effectiveness is limited by the lack of a unified evaluation framework. To address this, we present the Environmental Large Language model Evaluation (ELLE) question answer (QA) dataset, the first benchmark designed to assess large language models and their applications in ecological and environmental sciences. The ELLE dataset includes 1,130 question answer pairs across 16 environmental topics, categorized by domain, difficulty, and type. This comprehensive dataset standardizes performance assessments in these fields, enabling consistent and objective comparisons of generative AI performance. By providing a dedicated evaluation tool, ELLE dataset promotes the development and application of generative AI technologies for sustainable environmental outcomes. The dataset and code are available at https://elle.ceeai.net/ and https://github.com/CEEAI/elle.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_06277
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Environmental large language model Evaluation (ELLE) dataset: A Benchmark for Evaluating Generative AI applications in Eco-environment Domain Guo, Jing Li, Nan Xu, Ming Computation and Language Information Retrieval Generative AI holds significant potential for ecological and environmental applications such as monitoring, data analysis, education, and policy support. However, its effectiveness is limited by the lack of a unified evaluation framework. To address this, we present the Environmental Large Language model Evaluation (ELLE) question answer (QA) dataset, the first benchmark designed to assess large language models and their applications in ecological and environmental sciences. The ELLE dataset includes 1,130 question answer pairs across 16 environmental topics, categorized by domain, difficulty, and type. This comprehensive dataset standardizes performance assessments in these fields, enabling consistent and objective comparisons of generative AI performance. By providing a dedicated evaluation tool, ELLE dataset promotes the development and application of generative AI technologies for sustainable environmental outcomes. The dataset and code are available at https://elle.ceeai.net/ and https://github.com/CEEAI/elle.
title	Environmental large language model Evaluation (ELLE) dataset: A Benchmark for Evaluating Generative AI applications in Eco-environment Domain
topic	Computation and Language Information Retrieval
url	https://arxiv.org/abs/2501.06277

Similar Items