Saved in:
Bibliographic Details
Main Authors: Gopalakrishnan, Saisubramaniam, M, Harikrishnan P, Birru, Dagnachew
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.21608
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914290155388928
author Gopalakrishnan, Saisubramaniam
M, Harikrishnan P
Birru, Dagnachew
author_facet Gopalakrishnan, Saisubramaniam
M, Harikrishnan P
Birru, Dagnachew
contents Enterprise-grade Intelligent Document Processing (IDP) systems support high-stakes workflows across finance, insurance, and healthcare. Early-phase system validation under limited budgets mandates uncovering diverse failure mechanisms, rather than identifying a single worst-case document. We formalize this challenge as a Search-Based Software Testing (SBST) problem, aiming to identify complex interactions between document variables, with the objective to maximize the number of distinct failure types discovered within a fixed evaluation budget. Our methodology operates on a combinatorial space of document configurations, rendering instances of structural \emph{risk features} to induce realistic failure conditions. We benchmark a diverse portfolio of search strategies spanning evolutionary, swarm-based, quality-diversity, learning-based, and quantum under identical budget constraints. Through configuration-level exclusivity, win-rate, and cross-temporal overlap analyses, we show that different solvers consistently uncover failure modes that remain undiscovered by specific alternatives at comparable budgets. Crucially, cross-temporal analysis reveals persistent solver-specific discoveries across all evaluated budgets, with no single strategy exhibiting absolute dominance. While the union of all solvers eventually recovers the observed failure space, reliance on any individual method systematically delays the discovery of important risks. These results demonstrate intrinsic solver complementarity and motivate portfolio-based SBST strategies for robust industrial IDP validation.
format Preprint
id arxiv_https___arxiv_org_abs_2601_21608
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Search-Based Risk Feature Discovery in Document Structure Spaces under a Constrained Budget
Gopalakrishnan, Saisubramaniam
M, Harikrishnan P
Birru, Dagnachew
Artificial Intelligence
Enterprise-grade Intelligent Document Processing (IDP) systems support high-stakes workflows across finance, insurance, and healthcare. Early-phase system validation under limited budgets mandates uncovering diverse failure mechanisms, rather than identifying a single worst-case document. We formalize this challenge as a Search-Based Software Testing (SBST) problem, aiming to identify complex interactions between document variables, with the objective to maximize the number of distinct failure types discovered within a fixed evaluation budget. Our methodology operates on a combinatorial space of document configurations, rendering instances of structural \emph{risk features} to induce realistic failure conditions. We benchmark a diverse portfolio of search strategies spanning evolutionary, swarm-based, quality-diversity, learning-based, and quantum under identical budget constraints. Through configuration-level exclusivity, win-rate, and cross-temporal overlap analyses, we show that different solvers consistently uncover failure modes that remain undiscovered by specific alternatives at comparable budgets. Crucially, cross-temporal analysis reveals persistent solver-specific discoveries across all evaluated budgets, with no single strategy exhibiting absolute dominance. While the union of all solvers eventually recovers the observed failure space, reliance on any individual method systematically delays the discovery of important risks. These results demonstrate intrinsic solver complementarity and motivate portfolio-based SBST strategies for robust industrial IDP validation.
title Search-Based Risk Feature Discovery in Document Structure Spaces under a Constrained Budget
topic Artificial Intelligence
url https://arxiv.org/abs/2601.21608