Saved in:
| Main Authors: | Chin, Zhi-Yi, Chen, Pin-Yu, Chiu, Wei-Chen, Fritz, Mario |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.16769 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay
by: Wang, Hao, et al.
Published: (2026)
by: Wang, Hao, et al.
Published: (2026)
GenBreak: Red Teaming Text-to-Image Generators Using Large Language Models
by: Wang, Zilong, et al.
Published: (2025)
by: Wang, Zilong, et al.
Published: (2025)
Prompt Optimization and Evaluation for LLM Automated Red Teaming
by: Freenor, Michael, et al.
Published: (2025)
by: Freenor, Michael, et al.
Published: (2025)
Automated Progressive Red Teaming
by: Jiang, Bojian, et al.
Published: (2024)
by: Jiang, Bojian, et al.
Published: (2024)
ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming
by: Béjar, Mario Rodríguez, et al.
Published: (2026)
by: Béjar, Mario Rodríguez, et al.
Published: (2026)
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts
by: Chin, Zhi-Yi, et al.
Published: (2023)
by: Chin, Zhi-Yi, et al.
Published: (2023)
Red Teaming AI Red Teaming
by: Majumdar, Subhabrata, et al.
Published: (2025)
by: Majumdar, Subhabrata, et al.
Published: (2025)
RedTWIZ: Diverse LLM Red Teaming via Adaptive Attack Planning
by: Horal, Artur, et al.
Published: (2025)
by: Horal, Artur, et al.
Published: (2025)
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
by: Pathade, Chetan
Published: (2025)
by: Pathade, Chetan
Published: (2025)
DREAM: Scalable Red Teaming for Text-to-Image Generative Systems via Distribution Modeling
by: Li, Boheng, et al.
Published: (2025)
by: Li, Boheng, et al.
Published: (2025)
Learning-Based Automated Adversarial Red-Teaming for Robustness Evaluation of Large Language Models
by: Wei, Zhang, et al.
Published: (2025)
by: Wei, Zhang, et al.
Published: (2025)
RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
by: Xu, Huiyu, et al.
Published: (2024)
by: Xu, Huiyu, et al.
Published: (2024)
DP-BART for Privatized Text Rewriting under Local Differential Privacy
by: Igamberdiev, Timour, et al.
Published: (2023)
by: Igamberdiev, Timour, et al.
Published: (2023)
AEIOU: A Unified Defense Framework against NSFW Prompts in Text-to-Image Models
by: Wang, Yiming, et al.
Published: (2024)
by: Wang, Yiming, et al.
Published: (2024)
Value-Aligned Prompt Moderation via Zero-Shot Agentic Rewriting for Safe Image Generation
by: Zhao, Xin, et al.
Published: (2025)
by: Zhao, Xin, et al.
Published: (2025)
Resource Consumption Red-Teaming for Large Vision-Language Models
by: Gao, Haoran, et al.
Published: (2025)
by: Gao, Haoran, et al.
Published: (2025)
Training a General Purpose Automated Red Teaming Model
by: Padmakumar, Aishwarya, et al.
Published: (2026)
by: Padmakumar, Aishwarya, et al.
Published: (2026)
Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling
by: Cao, Yichuan, et al.
Published: (2025)
by: Cao, Yichuan, et al.
Published: (2025)
Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding
by: Qiu, Huming, et al.
Published: (2024)
by: Qiu, Huming, et al.
Published: (2024)
Rethinking and Red-Teaming Protective Perturbation in Personalized Diffusion Models
by: Liu, Yixin, et al.
Published: (2024)
by: Liu, Yixin, et al.
Published: (2024)
Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs
by: Hu, Xiaomeng, et al.
Published: (2025)
by: Hu, Xiaomeng, et al.
Published: (2025)
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks
by: You, Wenhao, et al.
Published: (2025)
by: You, Wenhao, et al.
Published: (2025)
SafeSearch: Automated Red-Teaming of LLM-Based Search Agents
by: Dong, Jianshuo, et al.
Published: (2025)
by: Dong, Jianshuo, et al.
Published: (2025)
OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs
by: Wang, Xin, et al.
Published: (2026)
by: Wang, Xin, et al.
Published: (2026)
Spend Your Budget Wisely: Towards an Intelligent Distribution of the Privacy Budget in Differentially Private Text Rewriting
by: Meisenbacher, Stephen, et al.
Published: (2025)
by: Meisenbacher, Stephen, et al.
Published: (2025)
Beyond Theoretical Bounds: Empirical Privacy Loss Calibration for Text Rewriting Under Local Differential Privacy
by: Li, Weijun, et al.
Published: (2026)
by: Li, Weijun, et al.
Published: (2026)
Adaptive Instruction Composition for Automated LLM Red-Teaming
by: Zymet, Jesse, et al.
Published: (2026)
by: Zymet, Jesse, et al.
Published: (2026)
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation
by: Quaye, Jessica, et al.
Published: (2024)
by: Quaye, Jessica, et al.
Published: (2024)
SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning
by: Zhou, Kaiwen, et al.
Published: (2025)
by: Zhou, Kaiwen, et al.
Published: (2025)
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution
by: Ba, Zhongjie, et al.
Published: (2023)
by: Ba, Zhongjie, et al.
Published: (2023)
Jailbreak-Zero: A Path to Pareto Optimal Red Teaming for Large Language Models
by: Hu, Kai, et al.
Published: (2025)
by: Hu, Kai, et al.
Published: (2025)
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
by: Chen, Shuo, et al.
Published: (2024)
by: Chen, Shuo, et al.
Published: (2024)
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models
by: Hu, Xiaomeng, et al.
Published: (2024)
by: Hu, Xiaomeng, et al.
Published: (2024)
Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation
by: Liu, Yi, et al.
Published: (2024)
by: Liu, Yi, et al.
Published: (2024)
Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming
by: Inie, Nanna, et al.
Published: (2023)
by: Inie, Nanna, et al.
Published: (2023)
Red Teaming Large Reasoning Models
by: Chen, Jiawei, et al.
Published: (2025)
by: Chen, Jiawei, et al.
Published: (2025)
Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation
by: Chen, Sixu, et al.
Published: (2026)
by: Chen, Sixu, et al.
Published: (2026)
RedTeamLLM: an Agentic AI framework for offensive security
by: Challita, Brian, et al.
Published: (2025)
by: Challita, Brian, et al.
Published: (2025)
PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization
by: Geng, Runpeng, et al.
Published: (2025)
by: Geng, Runpeng, et al.
Published: (2025)
From Coordinates to Context: An LLM-Bootstrapped Semantic Encoding Framework for Privacy-Preserving Mobile Sensing Stress Recognition
by: Phan, Hoang Khang, et al.
Published: (2025)
by: Phan, Hoang Khang, et al.
Published: (2025)
Similar Items
-
Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay
by: Wang, Hao, et al.
Published: (2026) -
GenBreak: Red Teaming Text-to-Image Generators Using Large Language Models
by: Wang, Zilong, et al.
Published: (2025) -
Prompt Optimization and Evaluation for LLM Automated Red Teaming
by: Freenor, Michael, et al.
Published: (2025) -
Automated Progressive Red Teaming
by: Jiang, Bojian, et al.
Published: (2024) -
ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming
by: Béjar, Mario Rodríguez, et al.
Published: (2026)