Saved in:
Bibliographic Details
Main Authors: Sorokin, Lev, Vasilev, Ivan, Pasini, Samuele
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.12615
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913030007160832
author Sorokin, Lev
Vasilev, Ivan
Pasini, Samuele
author_facet Sorokin, Lev
Vasilev, Ivan
Pasini, Samuele
contents This report summarizes the results of the first edition of the Large Language Model (LLM) Testing competition, held as part of the DeepTest workshop at ICSE 2026. Four tools competed in benchmarking an LLM-based car manual information retrieval application, with the objective of identifying user inputs for which the system fails to appropriately mention warnings contained in the manual. The testing solutions were evaluated based on their effectiveness in exposing failures and the diversity of the discovered failure-revealing tests. We report on the experimental methodology, the competitors, and the results.
format Preprint
id arxiv_https___arxiv_org_abs_2604_12615
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant
Sorokin, Lev
Vasilev, Ivan
Pasini, Samuele
Artificial Intelligence
This report summarizes the results of the first edition of the Large Language Model (LLM) Testing competition, held as part of the DeepTest workshop at ICSE 2026. Four tools competed in benchmarking an LLM-based car manual information retrieval application, with the objective of identifying user inputs for which the system fails to appropriately mention warnings contained in the manual. The testing solutions were evaluated based on their effectiveness in exposing failures and the diversity of the discovered failure-revealing tests. We report on the experimental methodology, the competitors, and the results.
title DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant
topic Artificial Intelligence
url https://arxiv.org/abs/2604.12615