Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Raheja, Tarun, Pochhi, Nilay, Curie, F. D. C. M.
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2410.09097
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912159556960256
author	Raheja, Tarun Pochhi, Nilay Curie, F. D. C. M.
author_facet	Raheja, Tarun Pochhi, Nilay Curie, F. D. C. M.
contents	Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks, but their vulnerability to jailbreak attacks poses significant security risks. This survey paper presents a comprehensive analysis of recent advancements in attack strategies and defense mechanisms within the field of Large Language Model (LLM) red-teaming. We analyze various attack methods, including gradient-based optimization, reinforcement learning, and prompt engineering approaches. We discuss the implications of these attacks on LLM safety and the need for improved defense mechanisms. This work aims to provide a thorough understanding of the current landscape of red-teaming attacks and defenses on LLMs, enabling the development of more secure and reliable language models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_09097
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations Raheja, Tarun Pochhi, Nilay Curie, F. D. C. M. Computation and Language Artificial Intelligence Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks, but their vulnerability to jailbreak attacks poses significant security risks. This survey paper presents a comprehensive analysis of recent advancements in attack strategies and defense mechanisms within the field of Large Language Model (LLM) red-teaming. We analyze various attack methods, including gradient-based optimization, reinforcement learning, and prompt engineering approaches. We discuss the implications of these attacks on LLM safety and the need for improved defense mechanisms. This work aims to provide a thorough understanding of the current landscape of red-teaming attacks and defenses on LLMs, enabling the development of more secure and reliable language models.
title	Recent advancements in LLM Red-Teaming: Techniques, Defenses, and Ethical Considerations
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2410.09097

Similar Items