Saved in:
| Main Authors: | Pandya, Ravi, Bland, Madison, Nguyen, Duy P., Liu, Changliu, Fisac, Jaime Fernández, Bajcsy, Andrea |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.13727 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Human-AI Safety: A Descendant of Generative AI and Control Systems Safety
by: Bajcsy, Andrea, et al.
Published: (2024)
by: Bajcsy, Andrea, et al.
Published: (2024)
Robots that Learn to Safely Influence via Prediction-Informed Reach-Avoid Dynamic Games
by: Pandya, Ravi, et al.
Published: (2024)
by: Pandya, Ravi, et al.
Published: (2024)
MAGICS: Adversarial RL with Minimax Actors Guided by Implicit Critic Stackelberg for Convergent Neural Synthesis of Robot Safety
by: Wang, Justin, et al.
Published: (2024)
by: Wang, Justin, et al.
Published: (2024)
Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity
by: Liang, Kaiqu, et al.
Published: (2024)
by: Liang, Kaiqu, et al.
Published: (2024)
From Refusal Tokens to Refusal Control: Discovering and Steering Category-Specific Refusal Directions
by: Alagharu, Rishab, et al.
Published: (2026)
by: Alagharu, Rishab, et al.
Published: (2026)
Toward Reusability of AI Models Using Dynamic Updates of AI Documentation
by: Bajcsy, Peter, et al.
Published: (2026)
by: Bajcsy, Peter, et al.
Published: (2026)
Multimodal Safe Control for Human-Robot Interaction
by: Pandya, Ravi, et al.
Published: (2023)
by: Pandya, Ravi, et al.
Published: (2023)
RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation
by: Liang, Kaiqu, et al.
Published: (2025)
by: Liang, Kaiqu, et al.
Published: (2025)
AssemblyComplete: 3D Combinatorial Construction with Deep Reinforcement Learning
by: Chen, Alan, et al.
Published: (2024)
by: Chen, Alan, et al.
Published: (2024)
Lattice: Generative Guardrails for Conversational Agents
by: Broadhurst, Emily, et al.
Published: (2026)
by: Broadhurst, Emily, et al.
Published: (2026)
Revisiting the Initial Steps in Adaptive Gradient Descent Optimization
by: Abuduweili, Abulikemu, et al.
Published: (2024)
by: Abuduweili, Abulikemu, et al.
Published: (2024)
Synthesis and Deployment of Maximal Robust Control Barrier Functions through Adversarial Reinforcement Learning
by: Oh, Donggeon David, et al.
Published: (2026)
by: Oh, Donggeon David, et al.
Published: (2026)
Estimating Neural Network Robustness via Lipschitz Constant and Architecture Sensitivity
by: Abuduweili, Abulikemu, et al.
Published: (2024)
by: Abuduweili, Abulikemu, et al.
Published: (2024)
Refusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive Topics
by: García-Ferrero, Iker, et al.
Published: (2025)
by: García-Ferrero, Iker, et al.
Published: (2025)
Enhancing Guardrails for Safe and Secure Healthcare AI
by: Gangavarapu, Ananya
Published: (2024)
by: Gangavarapu, Ananya
Published: (2024)
Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models
by: Liang, Kaiqu, et al.
Published: (2025)
by: Liang, Kaiqu, et al.
Published: (2025)
Control Invariant Sets for Neural Network Dynamical Systems and Recursive Feasibility in Model Predictive Control
by: Li, Xiao, et al.
Published: (2025)
by: Li, Xiao, et al.
Published: (2025)
LatentGuard: Controllable Latent Steering for Robust Refusal of Attacks and Reliable Response Generation
by: Shu, Huizhen, et al.
Published: (2025)
by: Shu, Huizhen, et al.
Published: (2025)
RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models
by: Muhamed, Aashiq, et al.
Published: (2025)
by: Muhamed, Aashiq, et al.
Published: (2025)
Meta-Control: Automatic Model-based Control Synthesis for Heterogeneous Robot Skills
by: Wei, Tianhao, et al.
Published: (2024)
by: Wei, Tianhao, et al.
Published: (2024)
Building Effective Safety Guardrails in AI Education Tools
by: Clark, Hannah-Beth, et al.
Published: (2025)
by: Clark, Hannah-Beth, et al.
Published: (2025)
A Comparative Evaluation of AI Agent Security Guardrails
by: Li, Qi, et al.
Published: (2026)
by: Li, Qi, et al.
Published: (2026)
Simultaneous Task Allocation and Planning for Multi-Robots under Hierarchical Temporal Logic Specifications
by: Luo, Xusheng, et al.
Published: (2024)
by: Luo, Xusheng, et al.
Published: (2024)
Policy-as-Prompt: Turning AI Governance Rules into Guardrails for AI Agents
by: Kholkar, Gauri, et al.
Published: (2025)
by: Kholkar, Gauri, et al.
Published: (2025)
From Governance Norms to Enforceable Controls: A Layered Translation Method for Runtime Guardrails in Agentic AI
by: Koch, Christopher
Published: (2026)
by: Koch, Christopher
Published: (2026)
ISAACS: Iterative Soft Adversarial Actor-Critic for Safety
by: Hsu, Kai-Chieh, et al.
Published: (2022)
by: Hsu, Kai-Chieh, et al.
Published: (2022)
LatentRefusal: Latent-Signal Refusal for Unanswerable Text-to-SQL Queries
by: Ren, Xuancheng, et al.
Published: (2026)
by: Ren, Xuancheng, et al.
Published: (2026)
RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-style Contexts
by: She, Yining, et al.
Published: (2025)
by: She, Yining, et al.
Published: (2025)
COSMIC: Generalized Refusal Direction Identification in LLM Activations
by: Siu, Vincent, et al.
Published: (2025)
by: Siu, Vincent, et al.
Published: (2025)
Current state of LLM Risks and AI Guardrails
by: Ayyamperumal, Suriya Ganesh, et al.
Published: (2024)
by: Ayyamperumal, Suriya Ganesh, et al.
Published: (2024)
To Use or to Refuse? Re-Centering Student Agency with Generative AI in Engineering Design Education
by: Willems, Thijs, et al.
Published: (2025)
by: Willems, Thijs, et al.
Published: (2025)
Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection
by: Giarrusso, Francesco, et al.
Published: (2025)
by: Giarrusso, Francesco, et al.
Published: (2025)
Proof-of-Guardrail in AI Agents and What (Not) to Trust from It
by: Jin, Xisen, et al.
Published: (2026)
by: Jin, Xisen, et al.
Published: (2026)
Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism
by: Cao, Lang
Published: (2023)
by: Cao, Lang
Published: (2023)
HanoiWorld : A Joint Embedding Predictive Architecture BasedWorld Model for Autonomous Vehicle Controller
by: Dat, Tran Tien, et al.
Published: (2026)
by: Dat, Tran Tien, et al.
Published: (2026)
No Free Lunch with Guardrails
by: Kumar, Divyanshu, et al.
Published: (2025)
by: Kumar, Divyanshu, et al.
Published: (2025)
Your Learned Constraint is Secretly a Backward Reachable Tube
by: Qadri, Mohamad, et al.
Published: (2025)
by: Qadri, Mohamad, et al.
Published: (2025)
Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning
by: Zhao, Weiye, et al.
Published: (2024)
by: Zhao, Weiye, et al.
Published: (2024)
Decomposition-based Hierarchical Task Allocation and Planning for Multi-Robots under Hierarchical Temporal Logic Specifications
by: Luo, Xusheng, et al.
Published: (2023)
by: Luo, Xusheng, et al.
Published: (2023)
Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries
by: Noever, David, et al.
Published: (2025)
by: Noever, David, et al.
Published: (2025)
Similar Items
-
Human-AI Safety: A Descendant of Generative AI and Control Systems Safety
by: Bajcsy, Andrea, et al.
Published: (2024) -
Robots that Learn to Safely Influence via Prediction-Informed Reach-Avoid Dynamic Games
by: Pandya, Ravi, et al.
Published: (2024) -
MAGICS: Adversarial RL with Minimax Actors Guided by Implicit Critic Stackelberg for Convergent Neural Synthesis of Robot Safety
by: Wang, Justin, et al.
Published: (2024) -
Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity
by: Liang, Kaiqu, et al.
Published: (2024) -
From Refusal Tokens to Refusal Control: Discovering and Steering Category-Specific Refusal Directions
by: Alagharu, Rishab, et al.
Published: (2026)