Bargiidšearbma: :: Library Catalog

Furkejuvvon:

Bibliográfalaš dieđut
Váldodahkkit:	Chen, Sizhe, Piet, Julien, Sitawarin, Chawin, Wagner, David
Materiálatiipa:	Preprint
Almmustuhtton:	2024
Fáttát:	Cryptography and Security
Liŋkkat:	https://arxiv.org/abs/2402.06363
Fáddágilkorat:	Lasit fáddágilkoriid Eai fáddágilkorat, Lasit vuosttaš fáddágilkora!

_version_	1866929515226202112
author	Chen, Sizhe Piet, Julien Sitawarin, Chawin Wagner, David
author_facet	Chen, Sizhe Piet, Julien Sitawarin, Chawin Wagner, David
contents	Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications, which perform text-based tasks by utilizing their advanced language understanding capabilities. However, as LLMs have improved, so have the attacks against them. Prompt injection attacks are an important threat: they trick the model into deviating from the original application's instructions and instead follow user directives. These attacks rely on the LLM's ability to follow instructions and inability to separate prompts and user data. We introduce structured queries, a general approach to tackle this problem. Structured queries separate prompts and data into two channels. We implement a system that supports structured queries. This system is made of (1) a secure front-end that formats a prompt and user data into a special format, and (2) a specially trained LLM that can produce high-quality outputs from these inputs. The LLM is trained using a novel fine-tuning strategy: we convert a base (non-instruction-tuned) LLM to a structured instruction-tuned model that will only follow instructions in the prompt portion of a query. To do so, we augment standard instruction tuning datasets with examples that also include instructions in the data portion of the query, and fine-tune the model to ignore these. Our system significantly improves resistance to prompt injection attacks, with little or no impact on utility. Our code is released at https://github.com/Sizhe-Chen/StruQ.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_06363
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	StruQ: Defending Against Prompt Injection with Structured Queries Chen, Sizhe Piet, Julien Sitawarin, Chawin Wagner, David Cryptography and Security Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications, which perform text-based tasks by utilizing their advanced language understanding capabilities. However, as LLMs have improved, so have the attacks against them. Prompt injection attacks are an important threat: they trick the model into deviating from the original application's instructions and instead follow user directives. These attacks rely on the LLM's ability to follow instructions and inability to separate prompts and user data. We introduce structured queries, a general approach to tackle this problem. Structured queries separate prompts and data into two channels. We implement a system that supports structured queries. This system is made of (1) a secure front-end that formats a prompt and user data into a special format, and (2) a specially trained LLM that can produce high-quality outputs from these inputs. The LLM is trained using a novel fine-tuning strategy: we convert a base (non-instruction-tuned) LLM to a structured instruction-tuned model that will only follow instructions in the prompt portion of a query. To do so, we augment standard instruction tuning datasets with examples that also include instructions in the data portion of the query, and fine-tune the model to ignore these. Our system significantly improves resistance to prompt injection attacks, with little or no impact on utility. Our code is released at https://github.com/Sizhe-Chen/StruQ.
title	StruQ: Defending Against Prompt Injection with Structured Queries
topic	Cryptography and Security
url	https://arxiv.org/abs/2402.06363

Geahča maid