Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Dang, Quy-Anh, Ngo, Chris, Hy, Truong-Son
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Computation and Language
Online-Zugang:	https://arxiv.org/abs/2601.03699
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866908972588466176
author	Dang, Quy-Anh Ngo, Chris Hy, Truong-Son
author_facet	Dang, Quy-Anh Ngo, Chris Hy, Truong-Son
contents	As large language models (LLMs) become integral to safety-critical applications, ensuring their robustness against adversarial prompts is paramount. However, existing red teaming datasets suffer from inconsistent risk categorizations, limited domain coverage, and outdated evaluations, hindering systematic vulnerability assessments. To address these challenges, we introduce RedBench, a universal dataset aggregating 37 benchmark datasets from leading conferences and repositories, comprising 29,362 samples across attack and refusal prompts. RedBench employs a standardized taxonomy with 22 risk categories and 19 domains, enabling consistent and comprehensive evaluations of LLM vulnerabilities. We provide a detailed analysis of existing datasets, establish baselines for modern LLMs, and open-source the dataset and evaluation code. Our contributions facilitate robust comparisons, foster future research, and promote the development of secure and reliable LLMs for real-world deployment. Code: https://github.com/knoveleng/redeval
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_03699
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models Dang, Quy-Anh Ngo, Chris Hy, Truong-Son Computation and Language As large language models (LLMs) become integral to safety-critical applications, ensuring their robustness against adversarial prompts is paramount. However, existing red teaming datasets suffer from inconsistent risk categorizations, limited domain coverage, and outdated evaluations, hindering systematic vulnerability assessments. To address these challenges, we introduce RedBench, a universal dataset aggregating 37 benchmark datasets from leading conferences and repositories, comprising 29,362 samples across attack and refusal prompts. RedBench employs a standardized taxonomy with 22 risk categories and 19 domains, enabling consistent and comprehensive evaluations of LLM vulnerabilities. We provide a detailed analysis of existing datasets, establish baselines for modern LLMs, and open-source the dataset and evaluation code. Our contributions facilitate robust comparisons, foster future research, and promote the development of secure and reliable LLMs for real-world deployment. Code: https://github.com/knoveleng/redeval
title	RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models
topic	Computation and Language
url	https://arxiv.org/abs/2601.03699

Ähnliche Einträge