:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pandya, Ravi, Bland, Madison, Nguyen, Duy P., Liu, Changliu, Fisac, Jaime Fernández, Bajcsy, Andrea
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.13727
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Human-AI Safety: A Descendant of Generative AI and Control Systems Safety
by: Bajcsy, Andrea, et al.
Published: (2024)

Robots that Learn to Safely Influence via Prediction-Informed Reach-Avoid Dynamic Games
by: Pandya, Ravi, et al.
Published: (2024)

MAGICS: Adversarial RL with Minimax Actors Guided by Implicit Critic Stackelberg for Convergent Neural Synthesis of Robot Safety
by: Wang, Justin, et al.
Published: (2024)

Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity
by: Liang, Kaiqu, et al.
Published: (2024)

From Refusal Tokens to Refusal Control: Discovering and Steering Category-Specific Refusal Directions
by: Alagharu, Rishab, et al.
Published: (2026)

Toward Reusability of AI Models Using Dynamic Updates of AI Documentation
by: Bajcsy, Peter, et al.
Published: (2026)

Multimodal Safe Control for Human-Robot Interaction
by: Pandya, Ravi, et al.
Published: (2023)

RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation
by: Liang, Kaiqu, et al.
Published: (2025)

AssemblyComplete: 3D Combinatorial Construction with Deep Reinforcement Learning
by: Chen, Alan, et al.
Published: (2024)

Lattice: Generative Guardrails for Conversational Agents
by: Broadhurst, Emily, et al.
Published: (2026)

Revisiting the Initial Steps in Adaptive Gradient Descent Optimization
by: Abuduweili, Abulikemu, et al.
Published: (2024)

Synthesis and Deployment of Maximal Robust Control Barrier Functions through Adversarial Reinforcement Learning
by: Oh, Donggeon David, et al.
Published: (2026)

Estimating Neural Network Robustness via Lipschitz Constant and Architecture Sensitivity
by: Abuduweili, Abulikemu, et al.
Published: (2024)

Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics
by: García-Ferrero, Iker, et al.
Published: (2025)

Enhancing Guardrails for Safe and Secure Healthcare AI
by: Gangavarapu, Ananya
Published: (2024)

Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
by: Liang, Kaiqu, et al.
Published: (2025)

Control Invariant Sets for Neural Network Dynamical Systems and Recursive Feasibility in Model Predictive Control
by: Li, Xiao, et al.
Published: (2025)

LatentGuard: Controllable Latent Steering for Robust Refusal of Attacks and Reliable Response Generation
by: Shu, Huizhen, et al.
Published: (2025)

RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models
by: Muhamed, Aashiq, et al.
Published: (2025)

Meta-Control: Automatic Model-based Control Synthesis for Heterogeneous Robot Skills
by: Wei, Tianhao, et al.
Published: (2024)

Building Effective Safety Guardrails in AI Education Tools
by: Clark, Hannah-Beth, et al.
Published: (2025)

A Comparative Evaluation of AI Agent Security Guardrails
by: Li, Qi, et al.
Published: (2026)

Simultaneous Task Allocation and Planning for Multi-Robots under Hierarchical Temporal Logic Specifications
by: Luo, Xusheng, et al.
Published: (2024)

Policy-as-Prompt: Turning AI Governance Rules into Guardrails for AI Agents
by: Kholkar, Gauri, et al.
Published: (2025)

From Governance Norms to Enforceable Controls: A Layered Translation Method for Runtime Guardrails in Agentic AI
by: Koch, Christopher
Published: (2026)

ISAACS: Iterative Soft Adversarial Actor-Critic for Safety
by: Hsu, Kai-Chieh, et al.
Published: (2022)

LatentRefusal: Latent-Signal Refusal for Unanswerable Text-to-SQL Queries
by: Ren, Xuancheng, et al.
Published: (2026)

RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-style Contexts
by: She, Yining, et al.
Published: (2025)

COSMIC: Generalized Refusal Direction Identification in LLM Activations
by: Siu, Vincent, et al.
Published: (2025)

Current state of LLM Risks and AI Guardrails
by: Ayyamperumal, Suriya Ganesh, et al.
Published: (2024)

To Use or to Refuse? Re-Centering Student Agency with Generative AI in Engineering Design Education
by: Willems, Thijs, et al.
Published: (2025)

Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection
by: Giarrusso, Francesco, et al.
Published: (2025)

Proof-of-Guardrail in AI Agents and What (Not) to Trust from It
by: Jin, Xisen, et al.
Published: (2026)

Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism
by: Cao, Lang
Published: (2023)

HanoiWorld : A Joint Embedding Predictive Architecture BasedWorld Model for Autonomous Vehicle Controller
by: Dat, Tran Tien, et al.
Published: (2026)

No Free Lunch with Guardrails
by: Kumar, Divyanshu, et al.
Published: (2025)

Your Learned Constraint is Secretly a Backward Reachable Tube
by: Qadri, Mohamad, et al.
Published: (2025)

Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning
by: Zhao, Weiye, et al.
Published: (2024)

Decomposition-based Hierarchical Task Allocation and Planning for Multi-Robots under Hierarchical Temporal Logic Specifications
by: Luo, Xusheng, et al.
Published: (2023)

Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries
by: Noever, David, et al.
Published: (2025)