:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Han, Shanshan, Avestimehr, Salman, He, Chaoyang
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.08142
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Kick Bad Guys Out! Conditionally Activated Anomaly Detection in Federated Learning with Zero-Knowledge Proof Verification
by: Han, Shanshan, et al.
Published: (2023)

TensorOpera Router: A Multi-Model Router for Efficient LLM Inference
by: Stripelis, Dimitris, et al.
Published: (2024)

ATP: Enabling Fast LLM Serving via Attention on Top Principal Keys
by: Niu, Yue, et al.
Published: (2024)

Alopex: A Computational Framework for Enabling On-Device Function Calls with LLMs
by: Ran, Yide, et al.
Published: (2024)

Safety Guardrails for LLM-Enabled Robots
by: Ravichandran, Zachary, et al.
Published: (2025)

Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
by: Han, Shanshan
Published: (2024)

TorchOpera: A Compound AI System for LLM Safety
by: Han, Shanshan, et al.
Published: (2024)

Protect: Towards Robust Guardrailing Stack for Trustworthy Enterprise LLM Systems
by: Avinash, Karthik, et al.
Published: (2025)

Fox-1: Open Small Language Model for Cloud and Edge
by: Hu, Zijian, et al.
Published: (2024)

Bridging the AI Trustworthiness Gap between Functions and Norms
by: Di Scala, Daan, et al.
Published: (2025)

PSG-Agent: Personality-Aware Safety Guardrail for LLM-based Agents
by: Wu, Yaozu, et al.
Published: (2025)

Toward Super Agent System with Hybrid AI Routers
by: Yao, Yuhang, et al.
Published: (2025)

CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety
by: An, Heajun, et al.
Published: (2026)

Understanding Communication Backends in Cross-Silo Federated Learning
by: Ziashahabi, Amir, et al.
Published: (2026)

FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMs
by: Han, Shanshan, et al.
Published: (2023)

Bridging the Communication Gap: Evaluating AI Labeling Practices for Trustworthy AI Development
by: Fischer, Raphael, et al.
Published: (2025)

ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation
by: Feng, Tiantian, et al.
Published: (2024)

A Lightweight Explainable Guardrail for Prompt Safety
by: Islam, Md Asiful, et al.
Published: (2026)

Disentangled Safety Adapters Enable Efficient Guardrails and Flexible Inference-Time Alignment
by: Krishna, Kundan, et al.
Published: (2025)

Clustering and Median Aggregation Improve Differentially Private Inference
by: Amin, Kareem, et al.
Published: (2025)

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety
by: Liu, Zhe, et al.
Published: (2026)

Silencing the Guardrails: Inference-Time Jailbreaking via Dynamic Contextual Representation Ablation
by: Xing, Wenpeng, et al.
Published: (2026)

Reconsidering LLM Uncertainty Estimation Methods in the Wild
by: Bakman, Yavuz, et al.
Published: (2025)

GeoToken: Hierarchical Geolocalization of Images via Next Token Prediction
by: Ghasemi, Narges, et al.
Published: (2025)

FedGrAINS: Personalized SubGraph Federated Learning with Adaptive Neighbor Sampling
by: Ceyani, Emir, et al.
Published: (2025)

Test-Time Training Undermines Safety Guardrails
by: Antonelli, Simone, et al.
Published: (2026)

Edge Private Graph Neural Networks with Singular Value Perturbation
by: Tang, Tingting, et al.
Published: (2024)

AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection
by: Luo, Weidi, et al.
Published: (2025)

CryptoMamba: Leveraging State Space Models for Accurate Bitcoin Price Prediction
by: Sepehri, Mohammad Shahab, et al.
Published: (2025)

Building Effective Safety Guardrails in AI Education Tools
by: Clark, Hannah-Beth, et al.
Published: (2025)

Deep Research with Open-Domain Evaluation and Multi-Stage Guardrails for Safety
by: Huang, Wei-Chieh, et al.
Published: (2025)

A Guardrail for Safety Preservation: When Safety-Sensitive Subspace Meets Harmful-Resistant Null-Space
by: Zhang, Bingjie, et al.
Published: (2025)

Safety Through Reasoning: An Empirical Study of Reasoning Guardrail Models
by: Sreedhar, Makesh Narsimhan, et al.
Published: (2025)

Why Do Safety Guardrails Degrade Across Languages?
by: Zhang, Max, et al.
Published: (2026)

Provably Secure Agent Guardrail
by: Wu, Benlong, et al.
Published: (2026)

VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
by: Geng, Jiahui, et al.
Published: (2025)

CodeGuard: Improving LLM Guardrails in CS Education
by: Raihan, Nishat, et al.
Published: (2026)

Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs
by: Cho, Dongkyu Derek, et al.
Published: (2025)

OneShield -- the Next Generation of LLM Guardrails
by: DeLuca, Chad, et al.
Published: (2025)

ATHENA: Adaptive Test-Time Steering for Improving Count Fidelity in Diffusion Models
by: Sepehri, Mohammad Shahab, et al.
Published: (2026)