Saved in:
| Main Authors: | Van Doren, Madison, Ford, Casey |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.15478 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases
by: Ford, Casey, et al.
Published: (2026)
by: Ford, Casey, et al.
Published: (2026)
Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs
by: Ford, Casey, et al.
Published: (2026)
by: Ford, Casey, et al.
Published: (2026)
"Be My Cheese?": Cultural Nuance Benchmarking for Machine Translation in Multilingual LLMs
by: Van Doren, Madison, et al.
Published: (2026)
by: Van Doren, Madison, et al.
Published: (2026)
"Be My Cheese?": Assessing Cultural Nuance in Multilingual LLM Translations
by: Van Doren, Madison, et al.
Published: (2025)
by: Van Doren, Madison, et al.
Published: (2025)
Anecdoctoring: Automated Red-Teaming Across Language and Place
by: Cuevas, Alejandro, et al.
Published: (2025)
by: Cuevas, Alejandro, et al.
Published: (2025)
Prompt Optimization and Evaluation for LLM Automated Red Teaming
by: Freenor, Michael, et al.
Published: (2025)
by: Freenor, Michael, et al.
Published: (2025)
Gradient-Based Language Model Red Teaming
by: Wichers, Nevan, et al.
Published: (2024)
by: Wichers, Nevan, et al.
Published: (2024)
When Prompt Optimization Becomes Jailbreaking: Adaptive Red-Teaming of Large Language Models
by: Shamsi, Zafir, et al.
Published: (2026)
by: Shamsi, Zafir, et al.
Published: (2026)
Red Teaming Visual Language Models
by: Li, Mukai, et al.
Published: (2024)
by: Li, Mukai, et al.
Published: (2024)
ASTPrompter: Preference-Aligned Automated Language Model Red-Teaming to Generate Low-Perplexity Unsafe Prompts
by: Hardy, Amelia F., et al.
Published: (2024)
by: Hardy, Amelia F., et al.
Published: (2024)
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
by: Mazeika, Mantas, et al.
Published: (2024)
by: Mazeika, Mantas, et al.
Published: (2024)
Red Teaming Large Language Models for Healthcare
by: Balazadeh, Vahid, et al.
Published: (2025)
by: Balazadeh, Vahid, et al.
Published: (2025)
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts
by: Chin, Zhi-Yi, et al.
Published: (2023)
by: Chin, Zhi-Yi, et al.
Published: (2023)
Red Teaming Language Models for Processing Contradictory Dialogues
by: Wen, Xiaofei, et al.
Published: (2024)
by: Wen, Xiaofei, et al.
Published: (2024)
RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models
by: Dang, Quy-Anh, et al.
Published: (2026)
by: Dang, Quy-Anh, et al.
Published: (2026)
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
by: An, Bang, et al.
Published: (2024)
by: An, Bang, et al.
Published: (2024)
Learning-Based Automated Adversarial Red-Teaming for Robustness Evaluation of Large Language Models
by: Wei, Zhang, et al.
Published: (2025)
by: Wei, Zhang, et al.
Published: (2025)
Evaluating and Steering Modality Preferences in Multimodal Large Language Model
by: Zhang, Yu, et al.
Published: (2025)
by: Zhang, Yu, et al.
Published: (2025)
RRTL: Red Teaming Reasoning Large Language Models in Tool Learning
by: Liu, Yifei, et al.
Published: (2025)
by: Liu, Yifei, et al.
Published: (2025)
RedTopic: Toward Topic-Diverse Red Teaming of Large Language Models
by: Ding, Jiale, et al.
Published: (2025)
by: Ding, Jiale, et al.
Published: (2025)
Resource Consumption Red-Teaming for Large Vision-Language Models
by: Gao, Haoran, et al.
Published: (2025)
by: Gao, Haoran, et al.
Published: (2025)
Red-Teaming for Inducing Societal Bias in Large Language Models
by: Luo, Chu Fei, et al.
Published: (2024)
by: Luo, Chu Fei, et al.
Published: (2024)
Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies
by: Srikanth, Siddharth, et al.
Published: (2026)
by: Srikanth, Siddharth, et al.
Published: (2026)
STAR: SocioTechnical Approach to Red Teaming Language Models
by: Weidinger, Laura, et al.
Published: (2024)
by: Weidinger, Laura, et al.
Published: (2024)
Towards Red Teaming in Multimodal and Multilingual Translation
by: Ropers, Christophe, et al.
Published: (2024)
by: Ropers, Christophe, et al.
Published: (2024)
Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models
by: Yang, Hao, et al.
Published: (2024)
by: Yang, Hao, et al.
Published: (2024)
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
by: Ahuja, Sanchit, et al.
Published: (2023)
by: Ahuja, Sanchit, et al.
Published: (2023)
AutoRed: A Free-form Adversarial Prompt Generation Framework for Automated Red Teaming
by: Diao, Muxi, et al.
Published: (2025)
by: Diao, Muxi, et al.
Published: (2025)
Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming
by: Li, Rui, et al.
Published: (2025)
by: Li, Rui, et al.
Published: (2025)
RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
by: Xu, Huiyu, et al.
Published: (2024)
by: Xu, Huiyu, et al.
Published: (2024)
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
by: Pathade, Chetan
Published: (2025)
by: Pathade, Chetan
Published: (2025)
MemeArena: Automating Context-Aware Unbiased Evaluation of Harmfulness Understanding for Multimodal Large Language Models
by: Chen, Zixin, et al.
Published: (2025)
by: Chen, Zixin, et al.
Published: (2025)
Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts
by: Wu, Xuyang, et al.
Published: (2024)
by: Wu, Xuyang, et al.
Published: (2024)
A Multi-Domain Red Teaming Framework for Safety, Robustness, and Fairness Evaluation of Medical Large Language Models
by: Feier, Andrei Marian, et al.
Published: (2026)
by: Feier, Andrei Marian, et al.
Published: (2026)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
by: Verma, Apurv, et al.
Published: (2024)
by: Verma, Apurv, et al.
Published: (2024)
Stop Fixating on Prompts: Reasoning Hijacking and Constraint Tightening for Red-Teaming LLM Agents
by: Mao, Yanxu, et al.
Published: (2026)
by: Mao, Yanxu, et al.
Published: (2026)
GenBreak: Red Teaming Text-to-Image Generators Using Large Language Models
by: Wang, Zilong, et al.
Published: (2025)
by: Wang, Zilong, et al.
Published: (2025)
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
by: Wu, Zhaofeng, et al.
Published: (2024)
by: Wu, Zhaofeng, et al.
Published: (2024)
Red-Teaming Text-to-Image Models via In-Context Experience Replay and Semantic-Preserving Prompt Rewriting
by: Chin, Zhi-Yi, et al.
Published: (2024)
by: Chin, Zhi-Yi, et al.
Published: (2024)
Self-HarmLLM: Can Large Language Model Harm Itself?
by: Kim, Heehwan, et al.
Published: (2025)
by: Kim, Heehwan, et al.
Published: (2025)
Similar Items
-
Alignment Drift in Multimodal LLMs: A Two-Phase, Longitudinal Evaluation of Harm Across Eight Model Releases
by: Ford, Casey, et al.
Published: (2026) -
Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs
by: Ford, Casey, et al.
Published: (2026) -
"Be My Cheese?": Cultural Nuance Benchmarking for Machine Translation in Multilingual LLMs
by: Van Doren, Madison, et al.
Published: (2026) -
"Be My Cheese?": Assessing Cultural Nuance in Multilingual LLM Translations
by: Van Doren, Madison, et al.
Published: (2025) -
Anecdoctoring: Automated Red-Teaming Across Language and Place
by: Cuevas, Alejandro, et al.
Published: (2025)