Saved in:
| Main Authors: | Raheja, Tarun, Pochhi, Nilay, Curie, F. D. C. M. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.09097 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
From RLHF to Direct Alignment: A Theoretical Unification of Preference Learning for Large Language Models
by: Raheja, Tarun, et al.
Published: (2026)
by: Raheja, Tarun, et al.
Published: (2026)
TroubleLLM: Align to Red Team Expert
by: Xu, Zhuoer, et al.
Published: (2024)
by: Xu, Zhuoer, et al.
Published: (2024)
Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding
by: Yoo, Haneul, et al.
Published: (2024)
by: Yoo, Haneul, et al.
Published: (2024)
Exploring Straightforward Conversational Red-Teaming
by: Kour, George, et al.
Published: (2024)
by: Kour, George, et al.
Published: (2024)
T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search
by: Lee, Hyomin, et al.
Published: (2026)
by: Lee, Hyomin, et al.
Published: (2026)
SafeSearch: Automated Red-Teaming of LLM-Based Search Agents
by: Dong, Jianshuo, et al.
Published: (2025)
by: Dong, Jianshuo, et al.
Published: (2025)
Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming
by: Liu, Jiaxu, et al.
Published: (2024)
by: Liu, Jiaxu, et al.
Published: (2024)
Red Teaming Large Language Models for Healthcare
by: Balazadeh, Vahid, et al.
Published: (2025)
by: Balazadeh, Vahid, et al.
Published: (2025)
FERRET: Framework for Expansion Reliant Red Teaming
by: Mehrabi, Ninareh, et al.
Published: (2026)
by: Mehrabi, Ninareh, et al.
Published: (2026)
SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning
by: Zhou, Kaiwen, et al.
Published: (2025)
by: Zhou, Kaiwen, et al.
Published: (2025)
Adaptive Instruction Composition for Automated LLM Red-Teaming
by: Zymet, Jesse, et al.
Published: (2026)
by: Zymet, Jesse, et al.
Published: (2026)
Red Teaming Language Models for Processing Contradictory Dialogues
by: Wen, Xiaofei, et al.
Published: (2024)
by: Wen, Xiaofei, et al.
Published: (2024)
Capability-Based Scaling Trends for LLM-Based Red-Teaming
by: Panfilov, Alexander, et al.
Published: (2025)
by: Panfilov, Alexander, et al.
Published: (2025)
Anecdoctoring: Automated Red-Teaming Across Language and Place
by: Cuevas, Alejandro, et al.
Published: (2025)
by: Cuevas, Alejandro, et al.
Published: (2025)
Red Teaming Visual Language Models
by: Li, Mukai, et al.
Published: (2024)
by: Li, Mukai, et al.
Published: (2024)
Retrieval-Augmented Generation in Medicine: A Scoping Review of Technical Implementations, Clinical Applications, and Ethical Considerations
by: Yang, Rui, et al.
Published: (2025)
by: Yang, Rui, et al.
Published: (2025)
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
by: Chiu, Yu Ying, et al.
Published: (2024)
by: Chiu, Yu Ying, et al.
Published: (2024)
An Investigation of Linguistic Biases in LLM-Based Recommendations
by: Venkateswaran, Nitin, et al.
Published: (2026)
by: Venkateswaran, Nitin, et al.
Published: (2026)
M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs
by: Ha, Junwoo, et al.
Published: (2025)
by: Ha, Junwoo, et al.
Published: (2025)
Enhancing AI-Driven Education: Integrating Cognitive Frameworks, Linguistic Feedback Analysis, and Ethical Considerations for Improved Content Generation
by: Yaacoub, Antoun, et al.
Published: (2025)
by: Yaacoub, Antoun, et al.
Published: (2025)
RedTopic: Toward Topic-Diverse Red Teaming of Large Language Models
by: Ding, Jiale, et al.
Published: (2025)
by: Ding, Jiale, et al.
Published: (2025)
A New Approach Towards Autoformalization
by: Patel, Nilay, et al.
Published: (2023)
by: Patel, Nilay, et al.
Published: (2023)
Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs
by: Kim, Zae Myung, et al.
Published: (2024)
by: Kim, Zae Myung, et al.
Published: (2024)
When Prompt Optimization Becomes Jailbreaking: Adaptive Red-Teaming of Large Language Models
by: Shamsi, Zafir, et al.
Published: (2026)
by: Shamsi, Zafir, et al.
Published: (2026)
Are LLM Agents Behaviorally Coherent? Latent Profiles for Social Simulation
by: Mooney, James, et al.
Published: (2025)
by: Mooney, James, et al.
Published: (2025)
RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
by: Xu, Huiyu, et al.
Published: (2024)
by: Xu, Huiyu, et al.
Published: (2024)
TeamLLM: A Human-Like Team-Oriented Collaboration Framework for Multi-Step Contextualized Tasks
by: Wang, Xiangyu, et al.
Published: (2026)
by: Wang, Xiangyu, et al.
Published: (2026)
Effective Red-Teaming of Policy-Adherent Agents
by: Nakash, Itay, et al.
Published: (2025)
by: Nakash, Itay, et al.
Published: (2025)
A Survey of Recent Backdoor Attacks and Defenses in Large Language Models
by: Zhao, Shuai, et al.
Published: (2024)
by: Zhao, Shuai, et al.
Published: (2024)
Semantic uncertainty in advanced decoding methods for LLM generation
by: Foodeei, Darius, et al.
Published: (2025)
by: Foodeei, Darius, et al.
Published: (2025)
STAR: SocioTechnical Approach to Red Teaming Language Models
by: Weidinger, Laura, et al.
Published: (2024)
by: Weidinger, Laura, et al.
Published: (2024)
Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
by: Guo, Ruohao, et al.
Published: (2025)
by: Guo, Ruohao, et al.
Published: (2025)
Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation
by: Garcia, Adriana Alvarado, et al.
Published: (2026)
by: Garcia, Adriana Alvarado, et al.
Published: (2026)
Contextualized Privacy Defense for LLM Agents
by: Wen, Yule, et al.
Published: (2026)
by: Wen, Yule, et al.
Published: (2026)
AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations
by: Dingeto, Hiskias, et al.
Published: (2026)
by: Dingeto, Hiskias, et al.
Published: (2026)
A Multi-Domain Red Teaming Framework for Safety, Robustness, and Fairness Evaluation of Medical Large Language Models
by: Feier, Andrei Marian, et al.
Published: (2026)
by: Feier, Andrei Marian, et al.
Published: (2026)
MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models
by: Yan, Siyu, et al.
Published: (2025)
by: Yan, Siyu, et al.
Published: (2025)
LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked
by: Phute, Mansi, et al.
Published: (2023)
by: Phute, Mansi, et al.
Published: (2023)
Provable Defense Framework for LLM Jailbreaks via Noise-Augumented Alignment
by: Cheng, Zehua, et al.
Published: (2026)
by: Cheng, Zehua, et al.
Published: (2026)
The Amazing Agent Race: Strong Tool Users, Weak Navigators
by: Kim, Zae Myung, et al.
Published: (2026)
by: Kim, Zae Myung, et al.
Published: (2026)
Similar Items
-
From RLHF to Direct Alignment: A Theoretical Unification of Preference Learning for Large Language Models
by: Raheja, Tarun, et al.
Published: (2026) -
TroubleLLM: Align to Red Team Expert
by: Xu, Zhuoer, et al.
Published: (2024) -
Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding
by: Yoo, Haneul, et al.
Published: (2024) -
Exploring Straightforward Conversational Red-Teaming
by: Kour, George, et al.
Published: (2024) -
T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search
by: Lee, Hyomin, et al.
Published: (2026)