Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Xue, Shi, Xiangyu, Lou, Xinyue, Qi, Rui, Chen, Yufeng, Xu, Jinan, Han, Wenjuan
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2401.04471
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911751782531072
author	Zhang, Xue Shi, Xiangyu Lou, Xinyue Qi, Rui Chen, Yufeng Xu, Jinan Han, Wenjuan
author_facet	Zhang, Xue Shi, Xiangyu Lou, Xinyue Qi, Rui Chen, Yufeng Xu, Jinan Han, Wenjuan
contents	Large language models (LLMs) and multimodal large language models (MLLMs) have shown excellent general capabilities, even exhibiting adaptability in many professional domains such as law, economics, transportation, and medicine. Currently, many domain-specific benchmarks have been proposed to verify the performance of (M)LLMs in specific fields. Among various domains, transportation plays a crucial role in modern society as it impacts the economy, the environment, and the quality of life for billions of people. However, it is unclear how much traffic knowledge (M)LLMs possess and whether they can reliably perform transportation-related tasks. To address this gap, we propose TransportationGames, a carefully designed and thorough evaluation benchmark for assessing (M)LLMs in the transportation domain. By comprehensively considering the applications in real-world scenarios and referring to the first three levels in Bloom's Taxonomy, we test the performance of various (M)LLMs in memorizing, understanding, and applying transportation knowledge by the selected tasks. The experimental results show that although some models perform well in some tasks, there is still much room for improvement overall. We hope the release of TransportationGames can serve as a foundation for future research, thereby accelerating the implementation and application of (M)LLMs in the transportation domain.
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_04471
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	TransportationGames: Benchmarking Transportation Knowledge of (Multimodal) Large Language Models Zhang, Xue Shi, Xiangyu Lou, Xinyue Qi, Rui Chen, Yufeng Xu, Jinan Han, Wenjuan Computation and Language Large language models (LLMs) and multimodal large language models (MLLMs) have shown excellent general capabilities, even exhibiting adaptability in many professional domains such as law, economics, transportation, and medicine. Currently, many domain-specific benchmarks have been proposed to verify the performance of (M)LLMs in specific fields. Among various domains, transportation plays a crucial role in modern society as it impacts the economy, the environment, and the quality of life for billions of people. However, it is unclear how much traffic knowledge (M)LLMs possess and whether they can reliably perform transportation-related tasks. To address this gap, we propose TransportationGames, a carefully designed and thorough evaluation benchmark for assessing (M)LLMs in the transportation domain. By comprehensively considering the applications in real-world scenarios and referring to the first three levels in Bloom's Taxonomy, we test the performance of various (M)LLMs in memorizing, understanding, and applying transportation knowledge by the selected tasks. The experimental results show that although some models perform well in some tasks, there is still much room for improvement overall. We hope the release of TransportationGames can serve as a foundation for future research, thereby accelerating the implementation and application of (M)LLMs in the transportation domain.
title	TransportationGames: Benchmarking Transportation Knowledge of (Multimodal) Large Language Models
topic	Computation and Language
url	https://arxiv.org/abs/2401.04471

Similar Items