:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sorokin, Lev, Vasilev, Ivan, Pasini, Samuele
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.12615
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Can Search-Based Testing with Pareto Optimization Effectively Cover Failure-Revealing Test Inputs?
by: Sorokin, Lev, et al.
Published: (2024)

Simulator Ensembles for Trustworthy Autonomous Driving Testing
by: Sorokin, Lev, et al.
Published: (2025)

STELLAR: A Search-Based Testing Framework for Large Language Model Applications
by: Sorokin, Lev, et al.
Published: (2026)

Detecting Trojaned DNNs via Spectral Regression Analysis
by: Pasini, Samuele, et al.
Published: (2026)

Automated Factual Benchmarking for In-Car Conversational Systems using Large Language Models
by: Giebisch, Rafael, et al.
Published: (2025)

BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
by: Kuratov, Yuri, et al.
Published: (2024)

Cross-site scripting adversarial attacks based on deep reinforcement learning: Evaluation and extension study
by: Pasini, Samuele, et al.
Published: (2025)

VoiceBench: Benchmarking LLM-Based Voice Assistants
by: Chen, Yiming, et al.
Published: (2024)

LLM-Based Approach for Enhancing Maintainability of Automotive Architectures
by: Petrovic, Nenad, et al.
Published: (2025)

Hallucination in LLM-Based Code Generation: An Automotive Case Study
by: Pavel, Marc, et al.
Published: (2025)

ELABORATION: A Comprehensive Benchmark on Human-LLM Competitive Programming
by: Yang, Xinwei, et al.
Published: (2025)

ToolFuzz -- Automated Agent Tool Testing
by: Milev, Ivan, et al.
Published: (2025)

Towards Specification-Driven LLM-Based Generation of Embedded Automotive Software
by: Patil, Minal Suresh, et al.
Published: (2024)

Automotive innovation landscaping using LLM
by: Gorain, Raju, et al.
Published: (2024)

LLM-based Iterative Approach to Metamodeling in Automotive
by: Petrovic, Nenad, et al.
Published: (2025)

Internship Report: Benchmark of Deep Learning-based Imaging PPG in Automotive Domain
by: Tu, Yuqi, et al.
Published: (2024)

User Misconceptions of LLM-Based Conversational Programming Assistants
by: O'Brien, Gabrielle, et al.
Published: (2025)

BPMN Assistant: An LLM-Based Approach to Business Process Modeling
by: Licardo, Josip Tomo, et al.
Published: (2025)

HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds
by: Anokhin, Petr, et al.
Published: (2025)

Disrupting Test Development with AI Assistants
by: Joshi, Vijay, et al.
Published: (2024)

SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning
by: Liang, Yanchang, et al.
Published: (2026)

Benchmarking LLM Tool-Use in the Wild
by: Yu, Peijie, et al.
Published: (2026)

Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions
by: Wang, Shouqiao, et al.
Published: (2026)

LLM-Empowered Functional Safety and Security by Design in Automotive Systems
by: Petrovic, Nenad, et al.
Published: (2026)

LLM-Based Agents for Competitive Landscape Mapping in Drug Asset Due Diligence
by: Vinogradov, Vlad, et al.
Published: (2025)

Intelligent Assistants for the Semiconductor Failure Analysis with LLM-Based Planning Agents
by: Dobrovsky, Aline, et al.
Published: (2025)

The Base-Rate Effect on LLM Benchmark Performance: Disambiguating Test-Taking Strategies from Benchmark Performance
by: Moore, Kyle, et al.
Published: (2024)

LiveClawBench: Benchmarking LLM Agents on Complex, Real-World Assistant Tasks
by: Long, Xiang, et al.
Published: (2026)

DriveSafe: A Hierarchical Risk Taxonomy for Safety-Critical LLM-Based Driving Assistants
by: Kumar, Abhishek, et al.
Published: (2026)

AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment
by: Sun, Nan, et al.
Published: (2024)

MGRegBench: A Novel Benchmark Dataset with Anatomical Landmarks for Mammography Image Registration
by: Krasnova, Svetlana, et al.
Published: (2025)

AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
by: Anokhin, Petr, et al.
Published: (2024)

CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data
by: Cheng, Zhao, et al.
Published: (2024)

Benchmark Test-Time Scaling of General LLM Agents
by: Li, Xiaochuan, et al.
Published: (2026)

Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use
by: Thaman, Kunvar
Published: (2026)

DeepWriter: A Fact-Grounded Multimodal Writing Assistant Based On Offline Knowledge Base
by: Mao, Song, et al.
Published: (2025)

Benchmarking Agentic Systems in Automated Scientific Information Extraction with ChemX
by: Vepreva, Anastasia, et al.
Published: (2025)

Tests as Prompt: A Test-Driven-Development Benchmark for LLM Code Generation
by: Cui, Yi
Published: (2025)

Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard
by: Topsakal, Oguzhan, et al.
Published: (2024)

SiriusHelper: An LLM Agent-Based Operations Assistant for Big Data Platforms
by: Shen, Yu, et al.
Published: (2026)