Saved in:
| Main Authors: | Sorokin, Lev, Vasilev, Ivan, Pasini, Samuele |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.12615 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Can Search-Based Testing with Pareto Optimization Effectively Cover Failure-Revealing Test Inputs?
by: Sorokin, Lev, et al.
Published: (2024)
by: Sorokin, Lev, et al.
Published: (2024)
Simulator Ensembles for Trustworthy Autonomous Driving Testing
by: Sorokin, Lev, et al.
Published: (2025)
by: Sorokin, Lev, et al.
Published: (2025)
STELLAR: A Search-Based Testing Framework for Large Language Model Applications
by: Sorokin, Lev, et al.
Published: (2026)
by: Sorokin, Lev, et al.
Published: (2026)
Detecting Trojaned DNNs via Spectral Regression Analysis
by: Pasini, Samuele, et al.
Published: (2026)
by: Pasini, Samuele, et al.
Published: (2026)
Automated Factual Benchmarking for In-Car Conversational Systems using Large Language Models
by: Giebisch, Rafael, et al.
Published: (2025)
by: Giebisch, Rafael, et al.
Published: (2025)
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
by: Kuratov, Yuri, et al.
Published: (2024)
by: Kuratov, Yuri, et al.
Published: (2024)
Cross-site scripting adversarial attacks based on deep reinforcement learning: Evaluation and extension study
by: Pasini, Samuele, et al.
Published: (2025)
by: Pasini, Samuele, et al.
Published: (2025)
VoiceBench: Benchmarking LLM-Based Voice Assistants
by: Chen, Yiming, et al.
Published: (2024)
by: Chen, Yiming, et al.
Published: (2024)
LLM-Based Approach for Enhancing Maintainability of Automotive Architectures
by: Petrovic, Nenad, et al.
Published: (2025)
by: Petrovic, Nenad, et al.
Published: (2025)
Hallucination in LLM-Based Code Generation: An Automotive Case Study
by: Pavel, Marc, et al.
Published: (2025)
by: Pavel, Marc, et al.
Published: (2025)
ELABORATION: A Comprehensive Benchmark on Human-LLM Competitive Programming
by: Yang, Xinwei, et al.
Published: (2025)
by: Yang, Xinwei, et al.
Published: (2025)
ToolFuzz -- Automated Agent Tool Testing
by: Milev, Ivan, et al.
Published: (2025)
by: Milev, Ivan, et al.
Published: (2025)
Towards Specification-Driven LLM-Based Generation of Embedded Automotive Software
by: Patil, Minal Suresh, et al.
Published: (2024)
by: Patil, Minal Suresh, et al.
Published: (2024)
Automotive innovation landscaping using LLM
by: Gorain, Raju, et al.
Published: (2024)
by: Gorain, Raju, et al.
Published: (2024)
LLM-based Iterative Approach to Metamodeling in Automotive
by: Petrovic, Nenad, et al.
Published: (2025)
by: Petrovic, Nenad, et al.
Published: (2025)
Internship Report: Benchmark of Deep Learning-based Imaging PPG in Automotive Domain
by: Tu, Yuqi, et al.
Published: (2024)
by: Tu, Yuqi, et al.
Published: (2024)
User Misconceptions of LLM-Based Conversational Programming Assistants
by: O'Brien, Gabrielle, et al.
Published: (2025)
by: O'Brien, Gabrielle, et al.
Published: (2025)
BPMN Assistant: An LLM-Based Approach to Business Process Modeling
by: Licardo, Josip Tomo, et al.
Published: (2025)
by: Licardo, Josip Tomo, et al.
Published: (2025)
HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds
by: Anokhin, Petr, et al.
Published: (2025)
by: Anokhin, Petr, et al.
Published: (2025)
Disrupting Test Development with AI Assistants
by: Joshi, Vijay, et al.
Published: (2024)
by: Joshi, Vijay, et al.
Published: (2024)
SimuAgent: An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning
by: Liang, Yanchang, et al.
Published: (2026)
by: Liang, Yanchang, et al.
Published: (2026)
Benchmarking LLM Tool-Use in the Wild
by: Yu, Peijie, et al.
Published: (2026)
by: Yu, Peijie, et al.
Published: (2026)
Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions
by: Wang, Shouqiao, et al.
Published: (2026)
by: Wang, Shouqiao, et al.
Published: (2026)
LLM-Empowered Functional Safety and Security by Design in Automotive Systems
by: Petrovic, Nenad, et al.
Published: (2026)
by: Petrovic, Nenad, et al.
Published: (2026)
LLM-Based Agents for Competitive Landscape Mapping in Drug Asset Due Diligence
by: Vinogradov, Vlad, et al.
Published: (2025)
by: Vinogradov, Vlad, et al.
Published: (2025)
Intelligent Assistants for the Semiconductor Failure Analysis with LLM-Based Planning Agents
by: Dobrovsky, Aline, et al.
Published: (2025)
by: Dobrovsky, Aline, et al.
Published: (2025)
The Base-Rate Effect on LLM Benchmark Performance: Disambiguating Test-Taking Strategies from Benchmark Performance
by: Moore, Kyle, et al.
Published: (2024)
by: Moore, Kyle, et al.
Published: (2024)
LiveClawBench: Benchmarking LLM Agents on Complex, Real-World Assistant Tasks
by: Long, Xiang, et al.
Published: (2026)
by: Long, Xiang, et al.
Published: (2026)
DriveSafe: A Hierarchical Risk Taxonomy for Safety-Critical LLM-Based Driving Assistants
by: Kumar, Abhishek, et al.
Published: (2026)
by: Kumar, Abhishek, et al.
Published: (2026)
AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment
by: Sun, Nan, et al.
Published: (2024)
by: Sun, Nan, et al.
Published: (2024)
MGRegBench: A Novel Benchmark Dataset with Anatomical Landmarks for Mammography Image Registration
by: Krasnova, Svetlana, et al.
Published: (2025)
by: Krasnova, Svetlana, et al.
Published: (2025)
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
by: Anokhin, Petr, et al.
Published: (2024)
by: Anokhin, Petr, et al.
Published: (2024)
CI-Bench: Benchmarking Contextual Integrity of AI Assistants on Synthetic Data
by: Cheng, Zhao, et al.
Published: (2024)
by: Cheng, Zhao, et al.
Published: (2024)
Benchmark Test-Time Scaling of General LLM Agents
by: Li, Xiaochuan, et al.
Published: (2026)
by: Li, Xiaochuan, et al.
Published: (2026)
Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use
by: Thaman, Kunvar
Published: (2026)
by: Thaman, Kunvar
Published: (2026)
DeepWriter: A Fact-Grounded Multimodal Writing Assistant Based On Offline Knowledge Base
by: Mao, Song, et al.
Published: (2025)
by: Mao, Song, et al.
Published: (2025)
Benchmarking Agentic Systems in Automated Scientific Information Extraction with ChemX
by: Vepreva, Anastasia, et al.
Published: (2025)
by: Vepreva, Anastasia, et al.
Published: (2025)
Tests as Prompt: A Test-Driven-Development Benchmark for LLM Code Generation
by: Cui, Yi
Published: (2025)
by: Cui, Yi
Published: (2025)
Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard
by: Topsakal, Oguzhan, et al.
Published: (2024)
by: Topsakal, Oguzhan, et al.
Published: (2024)
SiriusHelper: An LLM Agent-Based Operations Assistant for Big Data Platforms
by: Shen, Yu, et al.
Published: (2026)
by: Shen, Yu, et al.
Published: (2026)
Similar Items
-
Can Search-Based Testing with Pareto Optimization Effectively Cover Failure-Revealing Test Inputs?
by: Sorokin, Lev, et al.
Published: (2024) -
Simulator Ensembles for Trustworthy Autonomous Driving Testing
by: Sorokin, Lev, et al.
Published: (2025) -
STELLAR: A Search-Based Testing Framework for Large Language Model Applications
by: Sorokin, Lev, et al.
Published: (2026) -
Detecting Trojaned DNNs via Spectral Regression Analysis
by: Pasini, Samuele, et al.
Published: (2026) -
Automated Factual Benchmarking for In-Car Conversational Systems using Large Language Models
by: Giebisch, Rafael, et al.
Published: (2025)