Saved in:
| Main Authors: | She, Yining, Peterson, Daniel W., Liu, Marianne Menglin, Upadhyay, Vikas, Chaghazardi, Mohammad Hossein, Kang, Eunsuk, Roth, Dan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.05310 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility
by: Hong, Yining, et al.
Published: (2026)
by: Hong, Yining, et al.
Published: (2026)
FASR: Automated Identification of Unsafe Control Actions in STPA
by: Dardik, Ian, et al.
Published: (2026)
by: Dardik, Ian, et al.
Published: (2026)
ToolScope: Enhancing LLM Agent Tool Use through Tool Merging and Context-Aware Filtering
by: Liu, Marianne Menglin, et al.
Published: (2025)
by: Liu, Marianne Menglin, et al.
Published: (2025)
Benchmarking LLM Guardrails in Handling Multilingual Toxicity
by: Yang, Yahan, et al.
Published: (2024)
by: Yang, Yahan, et al.
Published: (2024)
Routesplain: Towards Faithful and Intervenable Routing for Software-related Tasks
by: Štorek, Adam, et al.
Published: (2025)
by: Štorek, Adam, et al.
Published: (2025)
FairSense: Long-Term Fairness Analysis of ML-Enabled Systems
by: She, Yining, et al.
Published: (2025)
by: She, Yining, et al.
Published: (2025)
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety
by: Yang, Yahan, et al.
Published: (2025)
by: Yang, Yahan, et al.
Published: (2025)
LAD-RAG: Layout-aware Dynamic RAG for Visually-Rich Document Understanding
by: Sourati, Zhivar, et al.
Published: (2025)
by: Sourati, Zhivar, et al.
Published: (2025)
No Free Lunch with Guardrails
by: Kumar, Divyanshu, et al.
Published: (2025)
by: Kumar, Divyanshu, et al.
Published: (2025)
OpenGuardrails: A Configurable, Unified, and Scalable Guardrails Platform for Large Language Models
by: Wang, Thomas, et al.
Published: (2025)
by: Wang, Thomas, et al.
Published: (2025)
Triaging Threats to Specialized Guardrails
by: Mo, Wenjie Jacky, et al.
Published: (2026)
by: Mo, Wenjie Jacky, et al.
Published: (2026)
Provably Secure Agent Guardrail
by: Wu, Benlong, et al.
Published: (2026)
by: Wu, Benlong, et al.
Published: (2026)
Guardrail Baselines for Unlearning in LLMs
by: Thaker, Pratiksha, et al.
Published: (2024)
by: Thaker, Pratiksha, et al.
Published: (2024)
Learning Efficient Guardrails for Compliance
by: Wen, Xiaofei, et al.
Published: (2025)
by: Wen, Xiaofei, et al.
Published: (2025)
Holding the Guardrails on Involuntary Commitment
by: Carl H. Coleman
Published: (2024)
by: Carl H. Coleman
Published: (2024)
Cognitive Guardrails for Open-World Decision Making in Autonomous Drone Swarms
by: Cleland-Huang, Jane, et al.
Published: (2025)
by: Cleland-Huang, Jane, et al.
Published: (2025)
$R^2$-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
by: Kang, Mintong, et al.
Published: (2024)
by: Kang, Mintong, et al.
Published: (2024)
ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context
by: Li, Victoria R., et al.
Published: (2024)
by: Li, Victoria R., et al.
Published: (2024)
Safety Guardrails for LLM-Enabled Robots
by: Ravichandran, Zachary, et al.
Published: (2025)
by: Ravichandran, Zachary, et al.
Published: (2025)
Building Guardrails for Large Language Models
by: Dong, Yi, et al.
Published: (2024)
by: Dong, Yi, et al.
Published: (2024)
Lattice: Generative Guardrails for Conversational Agents
by: Broadhurst, Emily, et al.
Published: (2026)
by: Broadhurst, Emily, et al.
Published: (2026)
SafeVision: Efficient Image Guardrail with Robust Policy Adherence and Explainability
by: Xu, Peiyang, et al.
Published: (2025)
by: Xu, Peiyang, et al.
Published: (2025)
Protect: Towards Robust Guardrailing Stack for Trustworthy Enterprise LLM Systems
by: Avinash, Karthik, et al.
Published: (2025)
by: Avinash, Karthik, et al.
Published: (2025)
Taxonomy-Adaptive Moderation Model with Robust Guardrails for Large Language Models
by: Nandwana, Mahesh Kumar, et al.
Published: (2025)
by: Nandwana, Mahesh Kumar, et al.
Published: (2025)
Building a Domain-specific Guardrail Model in Production
by: Niknazar, Mohammad, et al.
Published: (2024)
by: Niknazar, Mohammad, et al.
Published: (2024)
TWGuard: A Case Study of LLM Safety Guardrails for Localized Linguistic Contexts
by: Chu, Hua-Rong, et al.
Published: (2026)
by: Chu, Hua-Rong, et al.
Published: (2026)
Current state of LLM Risks and AI Guardrails
by: Ayyamperumal, Suriya Ganesh, et al.
Published: (2024)
by: Ayyamperumal, Suriya Ganesh, et al.
Published: (2024)
Test-Time Training Undermines Safety Guardrails
by: Antonelli, Simone, et al.
Published: (2026)
by: Antonelli, Simone, et al.
Published: (2026)
Black-Box Guardrail Reverse-engineering Attack
by: Yao, Hongwei, et al.
Published: (2025)
by: Yao, Hongwei, et al.
Published: (2025)
Bypassing Safety Guardrails in LLMs Using Humor
by: Cisneros-Velarde, Pedro
Published: (2025)
by: Cisneros-Velarde, Pedro
Published: (2025)
A Lightweight Explainable Guardrail for Prompt Safety
by: Islam, Md Asiful, et al.
Published: (2026)
by: Islam, Md Asiful, et al.
Published: (2026)
OneShield -- the Next Generation of LLM Guardrails
by: DeLuca, Chad, et al.
Published: (2025)
by: DeLuca, Chad, et al.
Published: (2025)
Challenges in Guardrailing Large Language Models for Science
by: Pantha, Nishan, et al.
Published: (2024)
by: Pantha, Nishan, et al.
Published: (2024)
Enhancing Guardrails for Safe and Secure Healthcare AI
by: Gangavarapu, Ananya
Published: (2024)
by: Gangavarapu, Ananya
Published: (2024)
Evaluating the Robustness of Large Language Model Safety Guardrails Against Adversarial Attacks
by: Young, Richard J.
Published: (2025)
by: Young, Richard J.
Published: (2025)
CodeGuard: Improving LLM Guardrails in CS Education
by: Raihan, Nishat, et al.
Published: (2026)
by: Raihan, Nishat, et al.
Published: (2026)
Guardrail Selection in Line Charts to Contextualize Persuasive Visualizations
by: Nadib, Khandaker Abrar, et al.
Published: (2026)
by: Nadib, Khandaker Abrar, et al.
Published: (2026)
Bag of Tricks for Subverting Reasoning-based Safety Guardrails
by: Chen, Shuo, et al.
Published: (2025)
by: Chen, Shuo, et al.
Published: (2025)
Proof-of-Guardrail in AI Agents and What (Not) to Trust from It
by: Jin, Xisen, et al.
Published: (2026)
by: Jin, Xisen, et al.
Published: (2026)
Why Do Safety Guardrails Degrade Across Languages?
by: Zhang, Max, et al.
Published: (2026)
by: Zhang, Max, et al.
Published: (2026)
Similar Items
-
Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility
by: Hong, Yining, et al.
Published: (2026) -
FASR: Automated Identification of Unsafe Control Actions in STPA
by: Dardik, Ian, et al.
Published: (2026) -
ToolScope: Enhancing LLM Agent Tool Use through Tool Merging and Context-Aware Filtering
by: Liu, Marianne Menglin, et al.
Published: (2025) -
Benchmarking LLM Guardrails in Handling Multilingual Toxicity
by: Yang, Yahan, et al.
Published: (2024) -
Routesplain: Towards Faithful and Intervenable Routing for Software-related Tasks
by: Štorek, Adam, et al.
Published: (2025)