Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Masrani, Vaden, Akbari, Mohammad, Yue, David Ming Xuan, Rezaei, Ahmad, Zhang, Yong
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2412.12563
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909431428546560
author	Masrani, Vaden Akbari, Mohammad Yue, David Ming Xuan Rezaei, Ahmad Zhang, Yong
author_facet	Masrani, Vaden Akbari, Mohammad Yue, David Ming Xuan Rezaei, Ahmad Zhang, Yong
contents	In the era of costly pre-training of large language models, ensuring the intellectual property rights of model owners, and insuring that said models are responsibly deployed, is becoming increasingly important. To this end, we propose model watermarking via passthrough layers, which are added to existing pre-trained networks and trained using a self-supervised loss such that the model produces high-entropy output when prompted with a unique private key, and acts normally otherwise. Unlike existing model watermarking methods, our method is fully task-agnostic, and can be applied to both classification and sequence-to-sequence tasks without requiring advanced access to downstream fine-tuning datasets. We evaluate the proposed passthrough layers on a wide range of downstream tasks, and show experimentally our watermarking method achieves a near-perfect watermark extraction accuracy and false-positive rate in most cases without damaging original model performance. Additionally, we show our method is robust to both downstream fine-tuning, fine-pruning, and layer removal attacks, and can be trained in a fraction of the time required to train the original model. Code is available in the paper.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_12563
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Task-Agnostic Language Model Watermarking via High Entropy Passthrough Layers Masrani, Vaden Akbari, Mohammad Yue, David Ming Xuan Rezaei, Ahmad Zhang, Yong Computation and Language In the era of costly pre-training of large language models, ensuring the intellectual property rights of model owners, and insuring that said models are responsibly deployed, is becoming increasingly important. To this end, we propose model watermarking via passthrough layers, which are added to existing pre-trained networks and trained using a self-supervised loss such that the model produces high-entropy output when prompted with a unique private key, and acts normally otherwise. Unlike existing model watermarking methods, our method is fully task-agnostic, and can be applied to both classification and sequence-to-sequence tasks without requiring advanced access to downstream fine-tuning datasets. We evaluate the proposed passthrough layers on a wide range of downstream tasks, and show experimentally our watermarking method achieves a near-perfect watermark extraction accuracy and false-positive rate in most cases without damaging original model performance. Additionally, we show our method is robust to both downstream fine-tuning, fine-pruning, and layer removal attacks, and can be trained in a fraction of the time required to train the original model. Code is available in the paper.
title	Task-Agnostic Language Model Watermarking via High Entropy Passthrough Layers
topic	Computation and Language
url	https://arxiv.org/abs/2412.12563

Similar Items