Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Song, Linke, Pang, Zixuan, Wang, Wenhao, Wang, Zihao, Wang, XiaoFeng, Chen, Hongbo, Song, Wei, Jin, Yier, Meng, Dan, Hou, Rui
Format:	Preprint
Published:	2024
Subjects:	Cryptography and Security
Online Access:	https://arxiv.org/abs/2409.20002
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908603892367360
author	Song, Linke Pang, Zixuan Wang, Wenhao Wang, Zihao Wang, XiaoFeng Chen, Hongbo Song, Wei Jin, Yier Meng, Dan Hou, Rui
author_facet	Song, Linke Pang, Zixuan Wang, Wenhao Wang, Zihao Wang, XiaoFeng Chen, Hongbo Song, Wei Jin, Yier Meng, Dan Hou, Rui
contents	The wide deployment of Large Language Models (LLMs) has given rise to strong demands for optimizing their inference performance. Today's techniques serving this purpose primarily focus on reducing latency and improving throughput through algorithmic and hardware enhancements, while largely overlooking their privacy side effects, particularly in a multi-user environment. In our research, for the first time, we discovered a set of new timing side channels in LLM systems, arising from shared caches and GPU memory allocations, which can be exploited to infer both confidential system prompts and those issued by other users. These vulnerabilities echo security challenges observed in traditional computing systems, highlighting an urgent need to address potential information leakage in LLM serving infrastructures. In this paper, we report novel attack strategies designed to exploit such timing side channels inherent in LLM deployments, specifically targeting the Key-Value (KV) cache and semantic cache widely used to enhance LLM inference performance. Our approach leverages timing measurements and classification models to detect cache hits, allowing an adversary to infer private prompts with high accuracy. We also propose a token-by-token search algorithm to efficiently recover shared prompt prefixes in the caches, showing the feasibility of stealing system prompts and those produced by peer users. Our experimental studies on black-box testing of popular online LLM services demonstrate that such privacy risks are completely realistic, with significant consequences. Our findings underscore the need for robust mitigation to protect LLM systems against such emerging threats.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_20002
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems Song, Linke Pang, Zixuan Wang, Wenhao Wang, Zihao Wang, XiaoFeng Chen, Hongbo Song, Wei Jin, Yier Meng, Dan Hou, Rui Cryptography and Security The wide deployment of Large Language Models (LLMs) has given rise to strong demands for optimizing their inference performance. Today's techniques serving this purpose primarily focus on reducing latency and improving throughput through algorithmic and hardware enhancements, while largely overlooking their privacy side effects, particularly in a multi-user environment. In our research, for the first time, we discovered a set of new timing side channels in LLM systems, arising from shared caches and GPU memory allocations, which can be exploited to infer both confidential system prompts and those issued by other users. These vulnerabilities echo security challenges observed in traditional computing systems, highlighting an urgent need to address potential information leakage in LLM serving infrastructures. In this paper, we report novel attack strategies designed to exploit such timing side channels inherent in LLM deployments, specifically targeting the Key-Value (KV) cache and semantic cache widely used to enhance LLM inference performance. Our approach leverages timing measurements and classification models to detect cache hits, allowing an adversary to infer private prompts with high accuracy. We also propose a token-by-token search algorithm to efficiently recover shared prompt prefixes in the caches, showing the feasibility of stealing system prompts and those produced by peer users. Our experimental studies on black-box testing of popular online LLM services demonstrate that such privacy risks are completely realistic, with significant consequences. Our findings underscore the need for robust mitigation to protect LLM systems against such emerging threats.
title	The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems
topic	Cryptography and Security
url	https://arxiv.org/abs/2409.20002

Similar Items