Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Princis, Henrijs, Sharma, Arindam, David, Cristina
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2511.22277
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910161200742400
author	Princis, Henrijs Sharma, Arindam David, Cristina
author_facet	Princis, Henrijs Sharma, Arindam David, Cristina
contents	Large language models (LLMs) have shown remarkable ability to generate code, yet their outputs often violate syntactic or semantic constraints when guided only through natural language prompts. We introduce TreeCoder, the most general and flexible framework to date for exploring decoding strategies, constraints, and hyperparameters in LLMs, and use it in code generation to enforce correctness and structure during decoding rather than relying on prompt engineering. TreeCoder represents decoding as a tree search over candidate programs, where both decoding strategies and constraint functions - such as style, syntax, execution - are treated as first-class, optimisable components. This design enables systematic exploration and automatic tuning of decoding configurations using standard optimisation techniques. Experiments on the MBPP (Python) and SQL-Spider benchmarks show that TreeCoder consistently improves accuracy across open-source models such as CodeLlama, Mistral and DeepSeek, often outperforming their unconstrained baselines by considerable margins.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_22277
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation Princis, Henrijs Sharma, Arindam David, Cristina Machine Learning Large language models (LLMs) have shown remarkable ability to generate code, yet their outputs often violate syntactic or semantic constraints when guided only through natural language prompts. We introduce TreeCoder, the most general and flexible framework to date for exploring decoding strategies, constraints, and hyperparameters in LLMs, and use it in code generation to enforce correctness and structure during decoding rather than relying on prompt engineering. TreeCoder represents decoding as a tree search over candidate programs, where both decoding strategies and constraint functions - such as style, syntax, execution - are treated as first-class, optimisable components. This design enables systematic exploration and automatic tuning of decoding configurations using standard optimisation techniques. Experiments on the MBPP (Python) and SQL-Spider benchmarks show that TreeCoder consistently improves accuracy across open-source models such as CodeLlama, Mistral and DeepSeek, often outperforming their unconstrained baselines by considerable margins.
title	TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation
topic	Machine Learning
url	https://arxiv.org/abs/2511.22277

Similar Items