Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ye, Haotian, Jain, Himanshu, You, Chong, Suresh, Ananda Theertha, Lin, Haowei, Zou, James, Yu, Felix
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2504.09135
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908316578349056
author	Ye, Haotian Jain, Himanshu You, Chong Suresh, Ananda Theertha Lin, Haowei Zou, James Yu, Felix
author_facet	Ye, Haotian Jain, Himanshu You, Chong Suresh, Ananda Theertha Lin, Haowei Zou, James Yu, Felix
contents	In real-world applications of large language models, outputs are often required to be confined: selecting items from predefined product or document sets, generating phrases that comply with safety standards, or conforming to specialized formatting styles. To control the generation, constrained decoding has been widely adopted. However, existing prefix-tree-based constrained decoding is inefficient under GPU-based model inference paradigms, and it introduces unintended biases into the output distribution. This paper introduces Dynamic Importance Sampling for Constrained Decoding (DISC) with GPU-based Parallel Prefix-Verification (PPV), a novel algorithm that leverages dynamic importance sampling to achieve theoretically guaranteed asymptotic unbiasedness and overcomes the inefficiency of prefix-tree. Extensive experiments demonstrate the superiority of our method over existing methods in both efficiency and output quality. These results highlight the potential of our methods to improve constrained generation in applications where adherence to specific constraints is essential.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_09135
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Efficient and Asymptotically Unbiased Constrained Decoding for Large Language Models Ye, Haotian Jain, Himanshu You, Chong Suresh, Ananda Theertha Lin, Haowei Zou, James Yu, Felix Computation and Language In real-world applications of large language models, outputs are often required to be confined: selecting items from predefined product or document sets, generating phrases that comply with safety standards, or conforming to specialized formatting styles. To control the generation, constrained decoding has been widely adopted. However, existing prefix-tree-based constrained decoding is inefficient under GPU-based model inference paradigms, and it introduces unintended biases into the output distribution. This paper introduces Dynamic Importance Sampling for Constrained Decoding (DISC) with GPU-based Parallel Prefix-Verification (PPV), a novel algorithm that leverages dynamic importance sampling to achieve theoretically guaranteed asymptotic unbiasedness and overcomes the inefficiency of prefix-tree. Extensive experiments demonstrate the superiority of our method over existing methods in both efficiency and output quality. These results highlight the potential of our methods to improve constrained generation in applications where adherence to specific constraints is essential.
title	Efficient and Asymptotically Unbiased Constrained Decoding for Large Language Models
topic	Computation and Language
url	https://arxiv.org/abs/2504.09135

Similar Items