:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kumar, Divyanshu, Birur, Nitin Aravind, Baswa, Tanay, Agarwal, Sahil, Harshangi, Prashanth
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security Artificial Intelligence
Online Access:	https://arxiv.org/abs/2504.00441
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Quantifying CBRN Risk in Frontier Models
by: Kumar, Divyanshu, et al.
Published: (2025)

SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming
by: Kumar, Anurakt, et al.
Published: (2024)

Beyond Text: Multimodal Jailbreaking of Vision-Language and Audio Models through Perceptually Simple Transformations
by: Kumar, Divyanshu, et al.
Published: (2025)

VERA: Validation and Enhancement for Retrieval Augmented systems
by: Birur, Nitin Aravind, et al.
Published: (2024)

Fine-Tuning, Quantization, and LLMs: Navigating Unintended Outcomes
by: Kumar, Divyanshu, et al.
Published: (2024)

Beyond Western Politics: Cross-Cultural Benchmarks for Evaluating Partisan Associations in LLMs
by: Kumar, Divyanshu, et al.
Published: (2025)

Redirected, Not Removed: Task-Dependent Stereotyping Reveals the Limits of LLM Alignments
by: Kumar, Divyanshu, et al.
Published: (2026)

SocioEval: A Template-Based Framework for Evaluating Socioeconomic Status Bias in Foundation Models
by: Kumar, Divyanshu, et al.
Published: (2026)

Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs
by: Kumar, Divyanshu, et al.
Published: (2024)

No Free Lunch Theorem for Privacy-Preserving LLM Inference
by: Zhang, Xiaojin, et al.
Published: (2024)

No Free Lunch for Defending Against Prefilling Attack by In-Context Learning
by: Xue, Zhiyu, et al.
Published: (2024)

Provably Secure Agent Guardrail
by: Wu, Benlong, et al.
Published: (2026)

Enhancing Guardrails for Safe and Secure Healthcare AI
by: Gangavarapu, Ananya
Published: (2024)

AgentWall: A Runtime Safety Layer for Local AI Agents
by: Aravind, Ashwin
Published: (2026)

A Comparative Evaluation of AI Agent Security Guardrails
by: Li, Qi, et al.
Published: (2026)

Precision Guided Approach to Mitigate Data Poisoning Attacks in Federated Learning
by: Kumar, K Naveen, et al.
Published: (2024)

Cognitive Cybersecurity for Artificial Intelligence: Guardrail Engineering with CCS-7
by: Aydin, Yuksel
Published: (2025)

SoK: Evaluating Jailbreak Guardrails for Large Language Models
by: Wang, Xunguang, et al.
Published: (2025)

LPG: Balancing Efficiency and Policy Reasoning in Latent Policy Guardrails
by: Li, Nanxi, et al.
Published: (2026)

Quantum Gatekeeper: Multi-Factor Context-Bound Image Steganography with VQC Based Key Derivation on Quantum Hardware
by: Tomar, Sahil, et al.
Published: (2026)

Free Lunch for Federated Remote Sensing Target Fine-Grained Classification: A Parameter-Efficient Framework
by: Chen, Shengchao, et al.
Published: (2024)

Breaking Guardrails, Facing Walls: Insights on Adversarial AI for Defenders & Researchers
by: Bertollo, Giacomo, et al.
Published: (2025)

Automated Classification of Cybercrime Complaints using Transformer-based Language Models for Hinglish Texts
by: Rani, Nanda, et al.
Published: (2024)

Active Honeypot Guardrail System: Probing and Confirming Multi-Turn LLM Jailbreaks
by: Wu, ChenYu, et al.
Published: (2025)

Benchmarking Large Language Models for Zero-shot and Few-shot Phishing URL Detection
by: Hasan, Najmul, et al.
Published: (2026)

OneShield -- the Next Generation of LLM Guardrails
by: DeLuca, Chad, et al.
Published: (2025)

Proof-of-Guardrail in AI Agents and What (Not) to Trust from It
by: Jin, Xisen, et al.
Published: (2026)

NeuroFilter: Privacy Guardrails for Conversational LLM Agents
by: Das, Saswat, et al.
Published: (2026)

A Survey on Offensive AI Within Cybersecurity
by: Girhepuje, Sahil, et al.
Published: (2024)

SGuard-v1: Safety Guardrail for Large Language Models
by: Lee, JoonHo, et al.
Published: (2025)

Current state of LLM Risks and AI Guardrails
by: Ayyamperumal, Suriya Ganesh, et al.
Published: (2024)

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety
by: Liu, Zhe, et al.
Published: (2026)

Privacy Policy Enforcement Guardrails for Data-Sensitive Retrieval-Augmented Generation
by: Zafar, Osama, et al.
Published: (2026)

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
by: Li, Hao, et al.
Published: (2024)

Guardrails for trust, safety, and ethical development and deployment of Large Language Models (LLM)
by: Biswas, Anjanava, et al.
Published: (2026)

ADVERSA: Measuring Multi-Turn Guardrail Degradation and Judge Reliability in Large Language Models
by: Owiredu-Ashley, Harry
Published: (2026)

SecureRAG-RTL: A Retrieval-Augmented, Multi-Agent, Zero-Shot LLM-Driven Framework for Hardware Vulnerability Detection
by: Hasan, Touseef, et al.
Published: (2026)

Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility
by: Hong, Yining, et al.
Published: (2026)

SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence
by: Aghaei, Ehsan, et al.
Published: (2025)

In AI Sweet Harmony: Sociopragmatic Guardrail Bypasses and Evaluation-Awareness in OpenAI gpt-oss-20b
by: Durner, Nils
Published: (2025)