:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Raheja, Tarun, Pochhi, Nilay, Curie, F. D. C. M.
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2410.09097
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

From RLHF to Direct Alignment: A Theoretical Unification of Preference Learning for Large Language Models
by: Raheja, Tarun, et al.
Published: (2026)

TroubleLLM: Align to Red Team Expert
by: Xu, Zhuoer, et al.
Published: (2024)

Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding
by: Yoo, Haneul, et al.
Published: (2024)

Exploring Straightforward Conversational Red-Teaming
by: Kour, George, et al.
Published: (2024)

T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search
by: Lee, Hyomin, et al.
Published: (2026)

SafeSearch: Automated Red-Teaming of LLM-Based Search Agents
by: Dong, Jianshuo, et al.
Published: (2025)

Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming
by: Liu, Jiaxu, et al.
Published: (2024)

Red Teaming Large Language Models for Healthcare
by: Balazadeh, Vahid, et al.
Published: (2025)

FERRET: Framework for Expansion Reliant Red Teaming
by: Mehrabi, Ninareh, et al.
Published: (2026)

SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning
by: Zhou, Kaiwen, et al.
Published: (2025)

Adaptive Instruction Composition for Automated LLM Red-Teaming
by: Zymet, Jesse, et al.
Published: (2026)

Red Teaming Language Models for Processing Contradictory Dialogues
by: Wen, Xiaofei, et al.
Published: (2024)

Capability-Based Scaling Trends for LLM-Based Red-Teaming
by: Panfilov, Alexander, et al.
Published: (2025)

Anecdoctoring: Automated Red-Teaming Across Language and Place
by: Cuevas, Alejandro, et al.
Published: (2025)

Red Teaming Visual Language Models
by: Li, Mukai, et al.
Published: (2024)

Retrieval-Augmented Generation in Medicine: A Scoping Review of Technical Implementations, Clinical Applications, and Ethical Considerations
by: Yang, Rui, et al.
Published: (2025)

CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
by: Chiu, Yu Ying, et al.
Published: (2024)

An Investigation of Linguistic Biases in LLM-Based Recommendations
by: Venkateswaran, Nitin, et al.
Published: (2026)

M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs
by: Ha, Junwoo, et al.
Published: (2025)

Enhancing AI-Driven Education: Integrating Cognitive Frameworks, Linguistic Feedback Analysis, and Ethical Considerations for Improved Content Generation
by: Yaacoub, Antoun, et al.
Published: (2025)

RedTopic: Toward Topic-Diverse Red Teaming of Large Language Models
by: Ding, Jiale, et al.
Published: (2025)

A New Approach Towards Autoformalization
by: Patel, Nilay, et al.
Published: (2023)

Threads of Subtlety: Detecting Machine-Generated Texts Through Discourse Motifs
by: Kim, Zae Myung, et al.
Published: (2024)

When Prompt Optimization Becomes Jailbreaking: Adaptive Red-Teaming of Large Language Models
by: Shamsi, Zafir, et al.
Published: (2026)

Are LLM Agents Behaviorally Coherent? Latent Profiles for Social Simulation
by: Mooney, James, et al.
Published: (2025)

RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
by: Xu, Huiyu, et al.
Published: (2024)

TeamLLM: A Human-Like Team-Oriented Collaboration Framework for Multi-Step Contextualized Tasks
by: Wang, Xiangyu, et al.
Published: (2026)

Effective Red-Teaming of Policy-Adherent Agents
by: Nakash, Itay, et al.
Published: (2025)

A Survey of Recent Backdoor Attacks and Defenses in Large Language Models
by: Zhao, Shuai, et al.
Published: (2024)

Semantic uncertainty in advanced decoding methods for LLM generation
by: Foodeei, Darius, et al.
Published: (2025)

STAR: SocioTechnical Approach to Red Teaming Language Models
by: Weidinger, Laura, et al.
Published: (2024)

Tree-based Dialogue Reinforced Policy Optimization for Red-Teaming Attacks
by: Guo, Ruohao, et al.
Published: (2025)

Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation
by: Garcia, Adriana Alvarado, et al.
Published: (2026)

Contextualized Privacy Defense for LLM Agents
by: Wen, Yule, et al.
Published: (2026)

AgentRedBench: Dynamic Redteaming and Integration-Aware Defense for LLM Agents over SaaS Integrations
by: Dingeto, Hiskias, et al.
Published: (2026)

A Multi-Domain Red Teaming Framework for Safety, Robustness, and Fairness Evaluation of Medical Large Language Models
by: Feier, Andrei Marian, et al.
Published: (2026)

MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models
by: Yan, Siyu, et al.
Published: (2025)

LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked
by: Phute, Mansi, et al.
Published: (2023)

Provable Defense Framework for LLM Jailbreaks via Noise-Augumented Alignment
by: Cheng, Zehua, et al.
Published: (2026)

The Amazing Agent Race: Strong Tool Users, Weak Navigators
by: Kim, Zae Myung, et al.
Published: (2026)