Guardado en:
| Autores principales: | , , , , |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2601.07477 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| _version_ | 1866911415926784000 |
|---|---|
| author | Ma, Zihan Zhao, Zhikai Hua, Chuanbo Berto, Federico Park, Jinkyoo |
| author_facet | Ma, Zihan Zhao, Zhikai Hua, Chuanbo Berto, Federico Park, Jinkyoo |
| contents | Optimizing LLM-based agentic workflows is challenging for scaling AI capabilities. Current methods rely on coarse, end-to-end evaluation signals and lack fine-grained signals on where to refine, often resulting in inefficient or low-impact modifications. To address these limitations, we propose JudgeFlow, an Evaluation-Judge-Optimization-Update pipeline. We incorporate reusable, configurable logic blocks into agentic workflows to capture fundamental forms of logic. On top of this abstraction, we design a dedicated Judge module that inspects execution traces particularly failed runs and assigns rank-based responsibility scores to problematic blocks. These fine-grained diagnostic signals are then leveraged by an LLM-based optimizer, which focuses modifications on the most problematic block in the workflow. Our approach improves sample efficiency, enhances interpretability through block-level diagnostics, and provides a scalable foundation for automating increasingly complex agentic workflows. We evaluate JudgeFlow on mathematical reasoning and code generation benchmarks, where JudgeFlow achieves superior performance and efficiency compared to existing methods. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2601_07477 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | JudgeFlow: Agentic Workflow Optimization via Block Judge Ma, Zihan Zhao, Zhikai Hua, Chuanbo Berto, Federico Park, Jinkyoo Artificial Intelligence Optimizing LLM-based agentic workflows is challenging for scaling AI capabilities. Current methods rely on coarse, end-to-end evaluation signals and lack fine-grained signals on where to refine, often resulting in inefficient or low-impact modifications. To address these limitations, we propose JudgeFlow, an Evaluation-Judge-Optimization-Update pipeline. We incorporate reusable, configurable logic blocks into agentic workflows to capture fundamental forms of logic. On top of this abstraction, we design a dedicated Judge module that inspects execution traces particularly failed runs and assigns rank-based responsibility scores to problematic blocks. These fine-grained diagnostic signals are then leveraged by an LLM-based optimizer, which focuses modifications on the most problematic block in the workflow. Our approach improves sample efficiency, enhances interpretability through block-level diagnostics, and provides a scalable foundation for automating increasingly complex agentic workflows. We evaluate JudgeFlow on mathematical reasoning and code generation benchmarks, where JudgeFlow achieves superior performance and efficiency compared to existing methods. |
| title | JudgeFlow: Agentic Workflow Optimization via Block Judge |
| topic | Artificial Intelligence |
| url | https://arxiv.org/abs/2601.07477 |