Saved in:
Bibliographic Details
Main Authors: Le, Nghia T., Ritter, Alan, Goyal, Kartik
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.11629
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912829822468096
author Le, Nghia T.
Ritter, Alan
Goyal, Kartik
author_facet Le, Nghia T.
Ritter, Alan
Goyal, Kartik
contents We demonstrate that while the current approaches for language model watermarking are effective for open-ended generation, they are inadequate at watermarking LM outputs for constrained generation tasks with low-entropy output spaces. Therefore, we devise SeqMark, a sequence-level watermarking algorithm with semantic differentiation that balances the output quality, watermark detectability, and imperceptibility. It improves on the shortcomings of the prevalent token-level watermarking algorithms that cause under-utilization of the sequence-level entropy available for constrained generation tasks. Moreover, we identify and improve upon a different failure mode we term region collapse, associated with prior sequence-level watermarking algorithms. This occurs because the pseudorandom partitioning of semantic space for watermarking in these approaches causes all high-probability outputs to collapse into either invalid or valid regions, leading to a trade-off in output quality and watermarking effectiveness. SeqMark instead, differentiates the high-probable output subspace and partitions it into valid and invalid regions, ensuring the even spread of high-quality outputs among all the regions. On various constrained generation tasks like machine translation, code generation, and abstractive summarization, SeqMark substantially improves watermark detection accuracy (up to 28% increase in F1) while maintaining high generation quality.
format Preprint
id arxiv_https___arxiv_org_abs_2601_11629
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Semantic Differentiation for Tackling Challenges in Watermarking Low-Entropy Constrained Generation Outputs
Le, Nghia T.
Ritter, Alan
Goyal, Kartik
Cryptography and Security
Machine Learning
We demonstrate that while the current approaches for language model watermarking are effective for open-ended generation, they are inadequate at watermarking LM outputs for constrained generation tasks with low-entropy output spaces. Therefore, we devise SeqMark, a sequence-level watermarking algorithm with semantic differentiation that balances the output quality, watermark detectability, and imperceptibility. It improves on the shortcomings of the prevalent token-level watermarking algorithms that cause under-utilization of the sequence-level entropy available for constrained generation tasks. Moreover, we identify and improve upon a different failure mode we term region collapse, associated with prior sequence-level watermarking algorithms. This occurs because the pseudorandom partitioning of semantic space for watermarking in these approaches causes all high-probability outputs to collapse into either invalid or valid regions, leading to a trade-off in output quality and watermarking effectiveness. SeqMark instead, differentiates the high-probable output subspace and partitions it into valid and invalid regions, ensuring the even spread of high-quality outputs among all the regions. On various constrained generation tasks like machine translation, code generation, and abstractive summarization, SeqMark substantially improves watermark detection accuracy (up to 28% increase in F1) while maintaining high generation quality.
title Semantic Differentiation for Tackling Challenges in Watermarking Low-Entropy Constrained Generation Outputs
topic Cryptography and Security
Machine Learning
url https://arxiv.org/abs/2601.11629