:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Lopez-Martinez, Daniel
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2406.16455
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

First, do no harm: Breaking suicidogenic echo chambers in media recommendation
by: Díaz-Álvarez, Alberto, et al.
Published: (2026)

Enhancing Guardrails for Safe and Secure Healthcare AI
by: Gangavarapu, Ananya
Published: (2024)

RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-style Contexts
by: She, Yining, et al.
Published: (2025)

Building Effective Safety Guardrails in AI Education Tools
by: Clark, Hannah-Beth, et al.
Published: (2025)

A Comparative Evaluation of AI Agent Security Guardrails
by: Li, Qi, et al.
Published: (2026)

Policy-as-Prompt: Turning AI Governance Rules into Guardrails for AI Agents
by: Kholkar, Gauri, et al.
Published: (2025)

AI driven health recommender
by: Vignesh, K., et al.
Published: (2024)

Current state of LLM Risks and AI Guardrails
by: Ayyamperumal, Suriya Ganesh, et al.
Published: (2024)

From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails
by: Pandya, Ravi, et al.
Published: (2025)

Proof-of-Guardrail in AI Agents and What (Not) to Trust from It
by: Jin, Xisen, et al.
Published: (2026)

No Free Lunch with Guardrails
by: Kumar, Divyanshu, et al.
Published: (2025)

Breaking Guardrails, Facing Walls: Insights on Adversarial AI for Defenders & Researchers
by: Bertollo, Giacomo, et al.
Published: (2025)

Lattice: Generative Guardrails for Conversational Agents
by: Broadhurst, Emily, et al.
Published: (2026)

Characterizing and modeling harms from interactions with design patterns in AI interfaces
by: Ibrahim, Lujain, et al.
Published: (2024)

Provably Secure Agent Guardrail
by: Wu, Benlong, et al.
Published: (2026)

Safety Guardrails for LLM-Enabled Robots
by: Ravichandran, Zachary, et al.
Published: (2025)

Challenges in Guardrailing Large Language Models for Science
by: Pantha, Nishan, et al.
Published: (2024)

Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-integrated programming learning system
by: Nongkhai, Lalita Na, et al.
Published: (2026)

Learning Efficient Guardrails for Compliance
by: Wen, Xiaofei, et al.
Published: (2025)

Semantic Integrity Constraints: Declarative Guardrails for AI-Augmented Data Processing Systems
by: Lee, Alexander W., et al.
Published: (2025)

Deep learning with noisy labels in medical prediction problems: a scoping review
by: Wei, Yishu, et al.
Published: (2024)

Building Guardrails for Large Language Models
by: Dong, Yi, et al.
Published: (2024)

AI Harmonics: a human-centric and harms severity-adaptive AI risk assessment framework
by: Vei, Sofia, et al.
Published: (2025)

Climbing the label tree: Hierarchy-preserving contrastive learning for medical imaging
by: Khan, Alif Elham
Published: (2025)

A global log for medical AI
by: Noori, Ayush, et al.
Published: (2025)

Generative AI for automatic topic labelling
by: Kozlowski, Diego, et al.
Published: (2024)

Test-Time Training Undermines Safety Guardrails
by: Antonelli, Simone, et al.
Published: (2026)

A Lightweight Explainable Guardrail for Prompt Safety
by: Islam, Md Asiful, et al.
Published: (2026)

Behavioral Determinants of Deployed AI Agents in Social Networks: A Multi-Factor Study of Personality, Model, and Guardrail Specification
by: Wilson, Sarah, et al.
Published: (2026)

Generalization in medical AI: a perspective on developing scalable models
by: Zvuloni, Eran, et al.
Published: (2023)

Shift-Up: A Framework for Software Engineering Guardrails in AI-native Software Development -- Initial Findings
by: Lipsanen, Petrus, et al.
Published: (2026)

"The Diagram is like Guardrails": Structuring GenAI-assisted Hypotheses Exploration with an Interactive Shared Representation
by: Ding, Zijian, et al.
Published: (2025)

Towards interactive evaluations for interaction harms in human-AI systems
by: Ibrahim, Lujain, et al.
Published: (2024)

In AI Sweet Harmony: Sociopragmatic Guardrail Bypasses and Evaluation-Awareness in OpenAI gpt-oss-20b
by: Durner, Nils
Published: (2025)

Worldwide AI Ethics: a review of 200 guidelines and recommendations for AI governance
by: Corrêa, Nicholas Kluge, et al.
Published: (2022)

PSG-Agent: Personality-Aware Safety Guardrail for LLM-based Agents
by: Wu, Yaozu, et al.
Published: (2025)

Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences
by: Han, Shanshan, et al.
Published: (2025)

AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection
by: Luo, Weidi, et al.
Published: (2025)

MindGuard: Guardrail Classifiers for Multi-Turn Mental Health Support
by: Farinhas, António, et al.
Published: (2026)

Trust-Oriented Adaptive Guardrails for Large Language Models
by: Hu, Jinwei, et al.
Published: (2024)