:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chin, Zhi-Yi, Chen, Pin-Yu, Chiu, Wei-Chen, Fritz, Mario
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Computation and Language Cryptography and Security Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2411.16769
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay
by: Wang, Hao, et al.
Published: (2026)

GenBreak: Red Teaming Text-to-Image Generators Using Large Language Models
by: Wang, Zilong, et al.
Published: (2025)

Prompt Optimization and Evaluation for LLM Automated Red Teaming
by: Freenor, Michael, et al.
Published: (2025)

Automated Progressive Red Teaming
by: Jiang, Bojian, et al.
Published: (2024)

ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming
by: Béjar, Mario Rodríguez, et al.
Published: (2026)

Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts
by: Chin, Zhi-Yi, et al.
Published: (2023)

Red Teaming AI Red Teaming
by: Majumdar, Subhabrata, et al.
Published: (2025)

RedTWIZ: Diverse LLM Red Teaming via Adaptive Attack Planning
by: Horal, Artur, et al.
Published: (2025)

Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
by: Pathade, Chetan
Published: (2025)

DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling
by: Li, Boheng, et al.
Published: (2025)

Learning-Based Automated Adversarial Red-Teaming for Robustness Evaluation of Large Language Models
by: Wei, Zhang, et al.
Published: (2025)

RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
by: Xu, Huiyu, et al.
Published: (2024)

DP-BART for Privatized Text Rewriting under Local Differential Privacy
by: Igamberdiev, Timour, et al.
Published: (2023)

AEIOU: A Unified Defense Framework against NSFW Prompts in Text-to-Image Models
by: Wang, Yiming, et al.
Published: (2024)

Value-Aligned Prompt Moderation via Zero-Shot Agentic Rewriting for Safe Image Generation
by: Zhao, Xin, et al.
Published: (2025)

Resource Consumption Red-Teaming for Large Vision-Language Models
by: Gao, Haoran, et al.
Published: (2025)

Training a General Purpose Automated Red Teaming Model
by: Padmakumar, Aishwarya, et al.
Published: (2026)

Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling
by: Cao, Yichuan, et al.
Published: (2025)

Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding
by: Qiu, Huming, et al.
Published: (2024)

Rethinking and Red-Teaming Protective Perturbation in Personalized Diffusion Models
by: Liu, Yixin, et al.
Published: (2024)

Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs
by: Hu, Xiaomeng, et al.
Published: (2025)

MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
by: You, Wenhao, et al.
Published: (2025)

SafeSearch: Automated Red-Teaming of LLM-Based Search Agents
by: Dong, Jianshuo, et al.
Published: (2025)

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs
by: Wang, Xin, et al.
Published: (2026)

Spend Your Budget Wisely: Towards an Intelligent Distribution of the Privacy Budget in Differentially Private Text Rewriting
by: Meisenbacher, Stephen, et al.
Published: (2025)

Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy
by: Li, Weijun, et al.
Published: (2026)

Adaptive Instruction Composition for Automated LLM Red-Teaming
by: Zymet, Jesse, et al.
Published: (2026)

Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation
by: Quaye, Jessica, et al.
Published: (2024)

SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning
by: Zhou, Kaiwen, et al.
Published: (2025)

SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution
by: Ba, Zhongjie, et al.
Published: (2023)

Jailbreak-Zero: A Path to Pareto Optimal Red Teaming for Large Language Models
by: Hu, Kai, et al.
Published: (2025)

Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
by: Chen, Shuo, et al.
Published: (2024)

Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models
by: Hu, Xiaomeng, et al.
Published: (2024)

Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation
by: Liu, Yi, et al.
Published: (2024)

Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming
by: Inie, Nanna, et al.
Published: (2023)

Red Teaming Large Reasoning Models
by: Chen, Jiawei, et al.
Published: (2025)

Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation
by: Chen, Sixu, et al.
Published: (2026)

RedTeamLLM: an Agentic AI framework for offensive security
by: Challita, Brian, et al.
Published: (2025)

PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization
by: Geng, Runpeng, et al.
Published: (2025)

From Coordinates to Context: An LLM-Bootstrapped Semantic Encoding Framework for Privacy-Preserving Mobile Sensing Stress Recognition
by: Phan, Hoang Khang, et al.
Published: (2025)