Saved in:
| Main Author: | Lopez-Martinez, Daniel |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.16455 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
First, do no harm: Breaking suicidogenic echo chambers in media recommendation
by: Díaz-Álvarez, Alberto, et al.
Published: (2026)
by: Díaz-Álvarez, Alberto, et al.
Published: (2026)
Enhancing Guardrails for Safe and Secure Healthcare AI
by: Gangavarapu, Ananya
Published: (2024)
by: Gangavarapu, Ananya
Published: (2024)
RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-style Contexts
by: She, Yining, et al.
Published: (2025)
by: She, Yining, et al.
Published: (2025)
Building Effective Safety Guardrails in AI Education Tools
by: Clark, Hannah-Beth, et al.
Published: (2025)
by: Clark, Hannah-Beth, et al.
Published: (2025)
A Comparative Evaluation of AI Agent Security Guardrails
by: Li, Qi, et al.
Published: (2026)
by: Li, Qi, et al.
Published: (2026)
Policy-as-Prompt: Turning AI Governance Rules into Guardrails for AI Agents
by: Kholkar, Gauri, et al.
Published: (2025)
by: Kholkar, Gauri, et al.
Published: (2025)
AI driven health recommender
by: Vignesh, K., et al.
Published: (2024)
by: Vignesh, K., et al.
Published: (2024)
Current state of LLM Risks and AI Guardrails
by: Ayyamperumal, Suriya Ganesh, et al.
Published: (2024)
by: Ayyamperumal, Suriya Ganesh, et al.
Published: (2024)
From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails
by: Pandya, Ravi, et al.
Published: (2025)
by: Pandya, Ravi, et al.
Published: (2025)
Proof-of-Guardrail in AI Agents and What (Not) to Trust from It
by: Jin, Xisen, et al.
Published: (2026)
by: Jin, Xisen, et al.
Published: (2026)
No Free Lunch with Guardrails
by: Kumar, Divyanshu, et al.
Published: (2025)
by: Kumar, Divyanshu, et al.
Published: (2025)
Breaking Guardrails, Facing Walls: Insights on Adversarial AI for Defenders & Researchers
by: Bertollo, Giacomo, et al.
Published: (2025)
by: Bertollo, Giacomo, et al.
Published: (2025)
Lattice: Generative Guardrails for Conversational Agents
by: Broadhurst, Emily, et al.
Published: (2026)
by: Broadhurst, Emily, et al.
Published: (2026)
Characterizing and modeling harms from interactions with design patterns in AI interfaces
by: Ibrahim, Lujain, et al.
Published: (2024)
by: Ibrahim, Lujain, et al.
Published: (2024)
Provably Secure Agent Guardrail
by: Wu, Benlong, et al.
Published: (2026)
by: Wu, Benlong, et al.
Published: (2026)
Safety Guardrails for LLM-Enabled Robots
by: Ravichandran, Zachary, et al.
Published: (2025)
by: Ravichandran, Zachary, et al.
Published: (2025)
Challenges in Guardrailing Large Language Models for Science
by: Pantha, Nishan, et al.
Published: (2024)
by: Pantha, Nishan, et al.
Published: (2024)
Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-integrated programming learning system
by: Nongkhai, Lalita Na, et al.
Published: (2026)
by: Nongkhai, Lalita Na, et al.
Published: (2026)
Learning Efficient Guardrails for Compliance
by: Wen, Xiaofei, et al.
Published: (2025)
by: Wen, Xiaofei, et al.
Published: (2025)
Semantic Integrity Constraints: Declarative Guardrails for AI-Augmented Data Processing Systems
by: Lee, Alexander W., et al.
Published: (2025)
by: Lee, Alexander W., et al.
Published: (2025)
Deep learning with noisy labels in medical prediction problems: a scoping review
by: Wei, Yishu, et al.
Published: (2024)
by: Wei, Yishu, et al.
Published: (2024)
Building Guardrails for Large Language Models
by: Dong, Yi, et al.
Published: (2024)
by: Dong, Yi, et al.
Published: (2024)
AI Harmonics: a human-centric and harms severity-adaptive AI risk assessment framework
by: Vei, Sofia, et al.
Published: (2025)
by: Vei, Sofia, et al.
Published: (2025)
Climbing the label tree: Hierarchy-preserving contrastive learning for medical imaging
by: Khan, Alif Elham
Published: (2025)
by: Khan, Alif Elham
Published: (2025)
A global log for medical AI
by: Noori, Ayush, et al.
Published: (2025)
by: Noori, Ayush, et al.
Published: (2025)
Generative AI for automatic topic labelling
by: Kozlowski, Diego, et al.
Published: (2024)
by: Kozlowski, Diego, et al.
Published: (2024)
Test-Time Training Undermines Safety Guardrails
by: Antonelli, Simone, et al.
Published: (2026)
by: Antonelli, Simone, et al.
Published: (2026)
A Lightweight Explainable Guardrail for Prompt Safety
by: Islam, Md Asiful, et al.
Published: (2026)
by: Islam, Md Asiful, et al.
Published: (2026)
Behavioral Determinants of Deployed AI Agents in Social Networks: A Multi-Factor Study of Personality, Model, and Guardrail Specification
by: Wilson, Sarah, et al.
Published: (2026)
by: Wilson, Sarah, et al.
Published: (2026)
Generalization in medical AI: a perspective on developing scalable models
by: Zvuloni, Eran, et al.
Published: (2023)
by: Zvuloni, Eran, et al.
Published: (2023)
Shift-Up: A Framework for Software Engineering Guardrails in AI-native Software Development -- Initial Findings
by: Lipsanen, Petrus, et al.
Published: (2026)
by: Lipsanen, Petrus, et al.
Published: (2026)
"The Diagram is like Guardrails": Structuring GenAI-assisted Hypotheses Exploration with an Interactive Shared Representation
by: Ding, Zijian, et al.
Published: (2025)
by: Ding, Zijian, et al.
Published: (2025)
Towards interactive evaluations for interaction harms in human-AI systems
by: Ibrahim, Lujain, et al.
Published: (2024)
by: Ibrahim, Lujain, et al.
Published: (2024)
In AI Sweet Harmony: Sociopragmatic Guardrail Bypasses and Evaluation-Awareness in OpenAI gpt-oss-20b
by: Durner, Nils
Published: (2025)
by: Durner, Nils
Published: (2025)
Worldwide AI Ethics: a review of 200 guidelines and recommendations for AI governance
by: Corrêa, Nicholas Kluge, et al.
Published: (2022)
by: Corrêa, Nicholas Kluge, et al.
Published: (2022)
PSG-Agent: Personality-Aware Safety Guardrail for LLM-based Agents
by: Wu, Yaozu, et al.
Published: (2025)
by: Wu, Yaozu, et al.
Published: (2025)
Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences
by: Han, Shanshan, et al.
Published: (2025)
by: Han, Shanshan, et al.
Published: (2025)
AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection
by: Luo, Weidi, et al.
Published: (2025)
by: Luo, Weidi, et al.
Published: (2025)
MindGuard: Guardrail Classifiers for Multi-Turn Mental Health Support
by: Farinhas, António, et al.
Published: (2026)
by: Farinhas, António, et al.
Published: (2026)
Trust-Oriented Adaptive Guardrails for Large Language Models
by: Hu, Jinwei, et al.
Published: (2024)
by: Hu, Jinwei, et al.
Published: (2024)
Similar Items
-
First, do no harm: Breaking suicidogenic echo chambers in media recommendation
by: Díaz-Álvarez, Alberto, et al.
Published: (2026) -
Enhancing Guardrails for Safe and Secure Healthcare AI
by: Gangavarapu, Ananya
Published: (2024) -
RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-style Contexts
by: She, Yining, et al.
Published: (2025) -
Building Effective Safety Guardrails in AI Education Tools
by: Clark, Hannah-Beth, et al.
Published: (2025) -
A Comparative Evaluation of AI Agent Security Guardrails
by: Li, Qi, et al.
Published: (2026)