:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	She, Yining, Peterson, Daniel W., Liu, Marianne Menglin, Upadhyay, Vikas, Chaghazardi, Mohammad Hossein, Kang, Eunsuk, Roth, Dan
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.05310
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility
by: Hong, Yining, et al.
Published: (2026)

FASR: Automated Identification of Unsafe Control Actions in STPA
by: Dardik, Ian, et al.
Published: (2026)

ToolScope: Enhancing LLM Agent Tool Use through Tool Merging and Context-Aware Filtering
by: Liu, Marianne Menglin, et al.
Published: (2025)

Benchmarking LLM Guardrails in Handling Multilingual Toxicity
by: Yang, Yahan, et al.
Published: (2024)

Routesplain: Towards Faithful and Intervenable Routing for Software-related Tasks
by: Štorek, Adam, et al.
Published: (2025)

FairSense: Long-Term Fairness Analysis of ML-Enabled Systems
by: She, Yining, et al.
Published: (2025)

MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety
by: Yang, Yahan, et al.
Published: (2025)

LAD-RAG: Layout-aware Dynamic RAG for Visually-Rich Document Understanding
by: Sourati, Zhivar, et al.
Published: (2025)

No Free Lunch with Guardrails
by: Kumar, Divyanshu, et al.
Published: (2025)

OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models
by: Wang, Thomas, et al.
Published: (2025)

Triaging Threats to Specialized Guardrails
by: Mo, Wenjie Jacky, et al.
Published: (2026)

Provably Secure Agent Guardrail
by: Wu, Benlong, et al.
Published: (2026)

Guardrail Baselines for Unlearning in LLMs
by: Thaker, Pratiksha, et al.
Published: (2024)

Learning Efficient Guardrails for Compliance
by: Wen, Xiaofei, et al.
Published: (2025)

Holding the Guardrails on Involuntary Commitment
by: Carl H. Coleman
Published: (2024)

Cognitive Guardrails for Open-World Decision Making in Autonomous Drone Swarms
by: Cleland-Huang, Jane, et al.
Published: (2025)

$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
by: Kang, Mintong, et al.
Published: (2024)

ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context
by: Li, Victoria R., et al.
Published: (2024)

Safety Guardrails for LLM-Enabled Robots
by: Ravichandran, Zachary, et al.
Published: (2025)

Building Guardrails for Large Language Models
by: Dong, Yi, et al.
Published: (2024)

Lattice: Generative Guardrails for Conversational Agents
by: Broadhurst, Emily, et al.
Published: (2026)

SafeVision: Efficient Image Guardrail with Robust Policy Adherence and Explainability
by: Xu, Peiyang, et al.
Published: (2025)

Protect: Towards Robust Guardrailing Stack for Trustworthy Enterprise LLM Systems
by: Avinash, Karthik, et al.
Published: (2025)

Taxonomy-Adaptive Moderation Model with Robust Guardrails for Large Language Models
by: Nandwana, Mahesh Kumar, et al.
Published: (2025)

Building a Domain-specific Guardrail Model in Production
by: Niknazar, Mohammad, et al.
Published: (2024)

TWGuard: A Case Study of LLM Safety Guardrails for Localized Linguistic Contexts
by: Chu, Hua-Rong, et al.
Published: (2026)

Current state of LLM Risks and AI Guardrails
by: Ayyamperumal, Suriya Ganesh, et al.
Published: (2024)

Test-Time Training Undermines Safety Guardrails
by: Antonelli, Simone, et al.
Published: (2026)

Black-Box Guardrail Reverse-engineering Attack
by: Yao, Hongwei, et al.
Published: (2025)

Bypassing Safety Guardrails in LLMs Using Humor
by: Cisneros-Velarde, Pedro
Published: (2025)

A Lightweight Explainable Guardrail for Prompt Safety
by: Islam, Md Asiful, et al.
Published: (2026)

OneShield -- the Next Generation of LLM Guardrails
by: DeLuca, Chad, et al.
Published: (2025)

Challenges in Guardrailing Large Language Models for Science
by: Pantha, Nishan, et al.
Published: (2024)

Enhancing Guardrails for Safe and Secure Healthcare AI
by: Gangavarapu, Ananya
Published: (2024)

Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks
by: Young, Richard J.
Published: (2025)

CodeGuard: Improving LLM Guardrails in CS Education
by: Raihan, Nishat, et al.
Published: (2026)

Guardrail Selection in Line Charts to Contextualize Persuasive Visualizations
by: Nadib, Khandaker Abrar, et al.
Published: (2026)

Bag of Tricks for Subverting Reasoning-based Safety Guardrails
by: Chen, Shuo, et al.
Published: (2025)

Proof-of-Guardrail in AI Agents and What (Not) to Trust from It
by: Jin, Xisen, et al.
Published: (2026)

Why Do Safety Guardrails Degrade Across Languages?
by: Zhang, Max, et al.
Published: (2026)