Saved in:
| Main Authors: | Wang, Mengxuan, Chen, Yuxin, Xu, Gang, He, Tao, Jiang, Hongjie, Li, Ming |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.03402 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Scalable Utility-Aware Multiclass Calibration
by: Hegazy, Mahmoud, et al.
Published: (2025)
by: Hegazy, Mahmoud, et al.
Published: (2025)
The Rogue Scalpel: Activation Steering Compromises LLM Safety
by: Korznikov, Anton, et al.
Published: (2025)
by: Korznikov, Anton, et al.
Published: (2025)
Saliency-Aware Regularized Quantization Calibration for Large Language Models
by: Zhao, Yanlong, et al.
Published: (2026)
by: Zhao, Yanlong, et al.
Published: (2026)
RACER: Risk-Aware Calibrated Efficient Routing for Large Language Models
by: Hao, Sai, et al.
Published: (2026)
by: Hao, Sai, et al.
Published: (2026)
NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs
by: Pan, Birong, et al.
Published: (2025)
by: Pan, Birong, et al.
Published: (2025)
The Safety-Aware Denoiser for Text Diffusion Models
by: Yusuf, Amman, et al.
Published: (2026)
by: Yusuf, Amman, et al.
Published: (2026)
Revisiting Uncertainty Estimation and Calibration of Large Language Models
by: Tao, Linwei, et al.
Published: (2025)
by: Tao, Linwei, et al.
Published: (2025)
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
by: Xiao, Jiancong, et al.
Published: (2025)
by: Xiao, Jiancong, et al.
Published: (2025)
Noise Injection Systemically Degrades Large Language Model Safety Guardrails
by: Shahani, Prithviraj Singh, et al.
Published: (2025)
by: Shahani, Prithviraj Singh, et al.
Published: (2025)
Transformer-Based Predictive Maintenance for Risk-Aware Instrument Calibration
by: Parthasarathy, Adithya, et al.
Published: (2026)
by: Parthasarathy, Adithya, et al.
Published: (2026)
LRT-Diffusion: Calibrated Risk-Aware Guidance for Diffusion Policies
by: Sun, Ximan, et al.
Published: (2025)
by: Sun, Ximan, et al.
Published: (2025)
Task-Driven Causal Feature Distillation: Towards Trustworthy Risk Prediction
by: Chu, Zhixuan, et al.
Published: (2023)
by: Chu, Zhixuan, et al.
Published: (2023)
Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models
by: Chen, Kejia, et al.
Published: (2025)
by: Chen, Kejia, et al.
Published: (2025)
WATS: Calibrating Graph Neural Networks with Wavelet-Aware Temperature Scaling
by: Li, Xiaoyang, et al.
Published: (2025)
by: Li, Xiaoyang, et al.
Published: (2025)
Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models
by: Wu, Jinman, et al.
Published: (2026)
by: Wu, Jinman, et al.
Published: (2026)
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models
by: Yu, Xinlei, et al.
Published: (2025)
by: Yu, Xinlei, et al.
Published: (2025)
Falcon-X: A Time Series Foundation Model for Heterogeneous Multivariate Modeling
by: Liu, Yiding, et al.
Published: (2026)
by: Liu, Yiding, et al.
Published: (2026)
X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Compromising Usability
by: Lu, Xiaoya, et al.
Published: (2025)
by: Lu, Xiaoya, et al.
Published: (2025)
Spot Risks Before Speaking! Unraveling Safety Attention Heads in Large Vision-Language Models
by: Zheng, Ziwei, et al.
Published: (2025)
by: Zheng, Ziwei, et al.
Published: (2025)
Multi-Level Safety Continual Projection for Fine-Tuned Large Language Models without Retraining
by: Han, Bing, et al.
Published: (2025)
by: Han, Bing, et al.
Published: (2025)
Instance-Level Safety-Aware Fidelity of Synthetic Data and Its Calibration
by: Cheng, Chih-Hong, et al.
Published: (2024)
by: Cheng, Chih-Hong, et al.
Published: (2024)
Rethinking Refinement: Correcting Generative Bias without Noise Injection
by: Peng, Xin, et al.
Published: (2026)
by: Peng, Xin, et al.
Published: (2026)
Compromising Embodied Agents with Contextual Backdoor Attacks
by: Liu, Aishan, et al.
Published: (2024)
by: Liu, Aishan, et al.
Published: (2024)
EEGAgent: A Unified Framework for Automated EEG Analysis Using Large Language Models
by: Zhao, Sha, et al.
Published: (2025)
by: Zhao, Sha, et al.
Published: (2025)
Utility-Aware Data Pricing: Token-Level Quality and Empirical Training Gain for LLMs
by: Xu, Minghui, et al.
Published: (2026)
by: Xu, Minghui, et al.
Published: (2026)
Estimating the Effects of Sample Training Orders for Large Language Models without Retraining
by: Yang, Hao, et al.
Published: (2025)
by: Yang, Hao, et al.
Published: (2025)
Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?
by: Adler, Coen, et al.
Published: (2025)
by: Adler, Coen, et al.
Published: (2025)
Neural Dynamics-Informed Pre-trained Framework for Personalized Brain Functional Network Construction
by: Jiang, Hongjie, et al.
Published: (2026)
by: Jiang, Hongjie, et al.
Published: (2026)
Behavior Injection: Preparing Language Models for Reinforcement Learning
by: Cen, Zhepeng, et al.
Published: (2025)
by: Cen, Zhepeng, et al.
Published: (2025)
Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges
by: Lu, Haoran, et al.
Published: (2025)
by: Lu, Haoran, et al.
Published: (2025)
Learning Safety Constraints for Large Language Models
by: Chen, Xin, et al.
Published: (2025)
by: Chen, Xin, et al.
Published: (2025)
MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval
by: Srivastava, Saksham Sahai, et al.
Published: (2025)
by: Srivastava, Saksham Sahai, et al.
Published: (2025)
A Benchmark Study on Calibration
by: Tao, Linwei, et al.
Published: (2023)
by: Tao, Linwei, et al.
Published: (2023)
Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models
by: Liu, Qin, et al.
Published: (2024)
by: Liu, Qin, et al.
Published: (2024)
Confidence Calibration in Large Language Models
by: Michael, Noam, et al.
Published: (2026)
by: Michael, Noam, et al.
Published: (2026)
Estimating Tail Risks in Language Model Output Distributions
by: Angell, Rico, et al.
Published: (2026)
by: Angell, Rico, et al.
Published: (2026)
Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models
by: Tayebati, Sina, et al.
Published: (2025)
by: Tayebati, Sina, et al.
Published: (2025)
Confidence Calibration under Ambiguous Ground Truth
by: Tao, Linwei, et al.
Published: (2026)
by: Tao, Linwei, et al.
Published: (2026)
Towards Understanding The Calibration Benefits of Sharpness-Aware Minimization
by: Tan, Chengli, et al.
Published: (2025)
by: Tan, Chengli, et al.
Published: (2025)
RASA: Routing-Aware Safety Alignment for Mixture-of-Experts Models
by: Liang, Jiacheng, et al.
Published: (2026)
by: Liang, Jiacheng, et al.
Published: (2026)
Similar Items
-
Scalable Utility-Aware Multiclass Calibration
by: Hegazy, Mahmoud, et al.
Published: (2025) -
The Rogue Scalpel: Activation Steering Compromises LLM Safety
by: Korznikov, Anton, et al.
Published: (2025) -
Saliency-Aware Regularized Quantization Calibration for Large Language Models
by: Zhao, Yanlong, et al.
Published: (2026) -
RACER: Risk-Aware Calibrated Efficient Routing for Large Language Models
by: Hao, Sai, et al.
Published: (2026) -
NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs
by: Pan, Birong, et al.
Published: (2025)