Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sorokin, Lev, Vasilev, Ivan, Pasini, Samuele
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.12615
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913030007160832
author	Sorokin, Lev Vasilev, Ivan Pasini, Samuele
author_facet	Sorokin, Lev Vasilev, Ivan Pasini, Samuele
contents	This report summarizes the results of the first edition of the Large Language Model (LLM) Testing competition, held as part of the DeepTest workshop at ICSE 2026. Four tools competed in benchmarking an LLM-based car manual information retrieval application, with the objective of identifying user inputs for which the system fails to appropriately mention warnings contained in the manual. The testing solutions were evaluated based on their effectiveness in exposing failures and the diversity of the discovered failure-revealing tests. We report on the experimental methodology, the competitors, and the results.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_12615
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant Sorokin, Lev Vasilev, Ivan Pasini, Samuele Artificial Intelligence This report summarizes the results of the first edition of the Large Language Model (LLM) Testing competition, held as part of the DeepTest workshop at ICSE 2026. Four tools competed in benchmarking an LLM-based car manual information retrieval application, with the objective of identifying user inputs for which the system fails to appropriately mention warnings contained in the manual. The testing solutions were evaluated based on their effectiveness in exposing failures and the diversity of the discovered failure-revealing tests. We report on the experimental methodology, the competitors, and the results.
title	DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant
topic	Artificial Intelligence
url	https://arxiv.org/abs/2604.12615

Similar Items