Saved in:
Bibliographic Details
Main Authors: Shbita, Basel, Gentile, Anna Lisa, Zhang, Bing, An, Sungeun, Thakur, Shailja, Asthana, Shubhi, Zhou, Yi, Surendran, Saptha, Ahmed, Farhan, Kulkarni, Rohan, Ong, Yuya Jeremy, DeLuca, Chad, Patel, Hima
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.23027
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Large language models (LLMs) have become central to modern AI workflows, powering applications from open-ended text generation to complex agent-based reasoning. However, debugging these models remains a persistent challenge due to their opaque and probabilistic nature and the difficulty of diagnosing errors across diverse tasks and settings. This paper introduces a systematic approach for LLM debugging that treats models as observable systems, providing structured, model-agnostic methods from issue detection to model refinement. By unifying evaluation, interpretability, and error-analysis practices, our approach enables practitioners to iteratively diagnose model weaknesses, refine prompts and model parameters, and adapt data for fine-tuning or assessment, while remaining effective in contexts where standardized benchmarks and evaluation criteria are lacking. We argue that such a structured methodology not only accelerates troubleshooting but also fosters reproducibility, transparency, and scalability in the deployment of LLM-based systems.