Saved in:
| Main Authors: | Han, Shanshan, Avestimehr, Salman, He, Chaoyang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.08142 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Kick Bad Guys Out! Conditionally Activated Anomaly Detection in Federated Learning with Zero-Knowledge Proof Verification
by: Han, Shanshan, et al.
Published: (2023)
by: Han, Shanshan, et al.
Published: (2023)
TensorOpera Router: A Multi-Model Router for Efficient LLM Inference
by: Stripelis, Dimitris, et al.
Published: (2024)
by: Stripelis, Dimitris, et al.
Published: (2024)
ATP: Enabling Fast LLM Serving via Attention on Top Principal Keys
by: Niu, Yue, et al.
Published: (2024)
by: Niu, Yue, et al.
Published: (2024)
Alopex: A Computational Framework for Enabling On-Device Function Calls with LLMs
by: Ran, Yide, et al.
Published: (2024)
by: Ran, Yide, et al.
Published: (2024)
Safety Guardrails for LLM-Enabled Robots
by: Ravichandran, Zachary, et al.
Published: (2025)
by: Ravichandran, Zachary, et al.
Published: (2025)
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
by: Han, Shanshan
Published: (2024)
by: Han, Shanshan
Published: (2024)
TorchOpera: A Compound AI System for LLM Safety
by: Han, Shanshan, et al.
Published: (2024)
by: Han, Shanshan, et al.
Published: (2024)
Protect: Towards Robust Guardrailing Stack for Trustworthy Enterprise LLM Systems
by: Avinash, Karthik, et al.
Published: (2025)
by: Avinash, Karthik, et al.
Published: (2025)
Fox-1: Open Small Language Model for Cloud and Edge
by: Hu, Zijian, et al.
Published: (2024)
by: Hu, Zijian, et al.
Published: (2024)
Bridging the AI Trustworthiness Gap between Functions and Norms
by: Di Scala, Daan, et al.
Published: (2025)
by: Di Scala, Daan, et al.
Published: (2025)
PSG-Agent: Personality-Aware Safety Guardrail for LLM-based Agents
by: Wu, Yaozu, et al.
Published: (2025)
by: Wu, Yaozu, et al.
Published: (2025)
Toward Super Agent System with Hybrid AI Routers
by: Yao, Yuhang, et al.
Published: (2025)
by: Yao, Yuhang, et al.
Published: (2025)
CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety
by: An, Heajun, et al.
Published: (2026)
by: An, Heajun, et al.
Published: (2026)
Understanding Communication Backends in Cross-Silo Federated Learning
by: Ziashahabi, Amir, et al.
Published: (2026)
by: Ziashahabi, Amir, et al.
Published: (2026)
FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMs
by: Han, Shanshan, et al.
Published: (2023)
by: Han, Shanshan, et al.
Published: (2023)
Bridging the Communication Gap: Evaluating AI Labeling Practices for Trustworthy AI Development
by: Fischer, Raphael, et al.
Published: (2025)
by: Fischer, Raphael, et al.
Published: (2025)
ModalityMirror: Improving Audio Classification in Modality Heterogeneity Federated Learning with Multimodal Distillation
by: Feng, Tiantian, et al.
Published: (2024)
by: Feng, Tiantian, et al.
Published: (2024)
A Lightweight Explainable Guardrail for Prompt Safety
by: Islam, Md Asiful, et al.
Published: (2026)
by: Islam, Md Asiful, et al.
Published: (2026)
Disentangled Safety Adapters Enable Efficient Guardrails and Flexible Inference-Time Alignment
by: Krishna, Kundan, et al.
Published: (2025)
by: Krishna, Kundan, et al.
Published: (2025)
Clustering and Median Aggregation Improve Differentially Private Inference
by: Amin, Kareem, et al.
Published: (2025)
by: Amin, Kareem, et al.
Published: (2025)
SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety
by: Liu, Zhe, et al.
Published: (2026)
by: Liu, Zhe, et al.
Published: (2026)
Silencing the Guardrails: Inference-Time Jailbreaking via Dynamic Contextual Representation Ablation
by: Xing, Wenpeng, et al.
Published: (2026)
by: Xing, Wenpeng, et al.
Published: (2026)
Reconsidering LLM Uncertainty Estimation Methods in the Wild
by: Bakman, Yavuz, et al.
Published: (2025)
by: Bakman, Yavuz, et al.
Published: (2025)
GeoToken: Hierarchical Geolocalization of Images via Next Token Prediction
by: Ghasemi, Narges, et al.
Published: (2025)
by: Ghasemi, Narges, et al.
Published: (2025)
FedGrAINS: Personalized SubGraph Federated Learning with Adaptive Neighbor Sampling
by: Ceyani, Emir, et al.
Published: (2025)
by: Ceyani, Emir, et al.
Published: (2025)
Test-Time Training Undermines Safety Guardrails
by: Antonelli, Simone, et al.
Published: (2026)
by: Antonelli, Simone, et al.
Published: (2026)
Edge Private Graph Neural Networks with Singular Value Perturbation
by: Tang, Tingting, et al.
Published: (2024)
by: Tang, Tingting, et al.
Published: (2024)
AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection
by: Luo, Weidi, et al.
Published: (2025)
by: Luo, Weidi, et al.
Published: (2025)
CryptoMamba: Leveraging State Space Models for Accurate Bitcoin Price Prediction
by: Sepehri, Mohammad Shahab, et al.
Published: (2025)
by: Sepehri, Mohammad Shahab, et al.
Published: (2025)
Building Effective Safety Guardrails in AI Education Tools
by: Clark, Hannah-Beth, et al.
Published: (2025)
by: Clark, Hannah-Beth, et al.
Published: (2025)
Deep Research with Open-Domain Evaluation and Multi-Stage Guardrails for Safety
by: Huang, Wei-Chieh, et al.
Published: (2025)
by: Huang, Wei-Chieh, et al.
Published: (2025)
A Guardrail for Safety Preservation: When Safety-Sensitive Subspace Meets Harmful-Resistant Null-Space
by: Zhang, Bingjie, et al.
Published: (2025)
by: Zhang, Bingjie, et al.
Published: (2025)
Safety Through Reasoning: An Empirical Study of Reasoning Guardrail Models
by: Sreedhar, Makesh Narsimhan, et al.
Published: (2025)
by: Sreedhar, Makesh Narsimhan, et al.
Published: (2025)
Why Do Safety Guardrails Degrade Across Languages?
by: Zhang, Max, et al.
Published: (2026)
by: Zhang, Max, et al.
Published: (2026)
Provably Secure Agent Guardrail
by: Wu, Benlong, et al.
Published: (2026)
by: Wu, Benlong, et al.
Published: (2026)
VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration
by: Geng, Jiahui, et al.
Published: (2025)
by: Geng, Jiahui, et al.
Published: (2025)
CodeGuard: Improving LLM Guardrails in CS Education
by: Raihan, Nishat, et al.
Published: (2026)
by: Raihan, Nishat, et al.
Published: (2026)
Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs
by: Cho, Dongkyu Derek, et al.
Published: (2025)
by: Cho, Dongkyu Derek, et al.
Published: (2025)
OneShield -- the Next Generation of LLM Guardrails
by: DeLuca, Chad, et al.
Published: (2025)
by: DeLuca, Chad, et al.
Published: (2025)
ATHENA: Adaptive Test-Time Steering for Improving Count Fidelity in Diffusion Models
by: Sepehri, Mohammad Shahab, et al.
Published: (2026)
by: Sepehri, Mohammad Shahab, et al.
Published: (2026)
Similar Items
-
Kick Bad Guys Out! Conditionally Activated Anomaly Detection in Federated Learning with Zero-Knowledge Proof Verification
by: Han, Shanshan, et al.
Published: (2023) -
TensorOpera Router: A Multi-Model Router for Efficient LLM Inference
by: Stripelis, Dimitris, et al.
Published: (2024) -
ATP: Enabling Fast LLM Serving via Attention on Top Principal Keys
by: Niu, Yue, et al.
Published: (2024) -
Alopex: A Computational Framework for Enabling On-Device Function Calls with LLMs
by: Ran, Yide, et al.
Published: (2024) -
Safety Guardrails for LLM-Enabled Robots
by: Ravichandran, Zachary, et al.
Published: (2025)