Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	He, Zhiyu, Wang, Maojiang, Gao, Xinwen, Luo, Yuchuan, Liu, Lin, Fu, Shaojing
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.09424
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914032567451648
author	He, Zhiyu Wang, Maojiang Gao, Xinwen Luo, Yuchuan Liu, Lin Fu, Shaojing
author_facet	He, Zhiyu Wang, Maojiang Gao, Xinwen Luo, Yuchuan Liu, Lin Fu, Shaojing
contents	Secure inference enables privacy-preserving machine learning by leveraging cryptographic protocols that support computations on sensitive user data without exposing it. However, integrating cryptographic protocols with large language models (LLMs) presents significant challenges, as the inherent complexity of these protocols, together with LLMs' massive parameter scale and sophisticated architectures, severely limits practical usability. In this work, we propose ENSI, a novel non-interactive secure inference framework for LLMs, based on the principle of co-designing the cryptographic protocols and LLM architecture. ENSI employs an optimized encoding strategy that seamlessly integrates CKKS scheme with a lightweight LLM variant, BitNet, significantly reducing the computational complexity of encrypted matrix multiplications. In response to the prohibitive computational demands of softmax under homomorphic encryption (HE), we pioneer the integration of the sigmoid attention mechanism with HE as a seamless, retraining-free alternative. Furthermore, by embedding the Bootstrapping operation within the RMSNorm process, we efficiently refresh ciphertexts while markedly decreasing the frequency of costly bootstrapping invocations. Experimental evaluations demonstrate that ENSI achieves approximately an 8x acceleration in matrix multiplications and a 2.6x speedup in softmax inference on CPU compared to state-of-the-art method, with the proportion of bootstrapping is reduced to just 1%.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_09424
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	ENSI: Efficient Non-Interactive Secure Inference for Large Language Models He, Zhiyu Wang, Maojiang Gao, Xinwen Luo, Yuchuan Liu, Lin Fu, Shaojing Cryptography and Security Artificial Intelligence Secure inference enables privacy-preserving machine learning by leveraging cryptographic protocols that support computations on sensitive user data without exposing it. However, integrating cryptographic protocols with large language models (LLMs) presents significant challenges, as the inherent complexity of these protocols, together with LLMs' massive parameter scale and sophisticated architectures, severely limits practical usability. In this work, we propose ENSI, a novel non-interactive secure inference framework for LLMs, based on the principle of co-designing the cryptographic protocols and LLM architecture. ENSI employs an optimized encoding strategy that seamlessly integrates CKKS scheme with a lightweight LLM variant, BitNet, significantly reducing the computational complexity of encrypted matrix multiplications. In response to the prohibitive computational demands of softmax under homomorphic encryption (HE), we pioneer the integration of the sigmoid attention mechanism with HE as a seamless, retraining-free alternative. Furthermore, by embedding the Bootstrapping operation within the RMSNorm process, we efficiently refresh ciphertexts while markedly decreasing the frequency of costly bootstrapping invocations. Experimental evaluations demonstrate that ENSI achieves approximately an 8x acceleration in matrix multiplications and a 2.6x speedup in softmax inference on CPU compared to state-of-the-art method, with the proportion of bootstrapping is reduced to just 1%.
title	ENSI: Efficient Non-Interactive Secure Inference for Large Language Models
topic	Cryptography and Security Artificial Intelligence
url	https://arxiv.org/abs/2509.09424

Similar Items