Saved in:
| Main Authors: | Shi, Zesheng, Zhou, Yucheng, Li, Jing |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.18588 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Team-Based Self-Play With Dual Adaptive Weighting for Fine-Tuning LLMs
by: Li, Wu, et al.
Published: (2026)
by: Li, Wu, et al.
Published: (2026)
RAGA: Reading-And-Graph-building-Agent for Autonomous Knowledge Graph Construction and Retrieval-Augmented Generation
by: Han, Chengrui, et al.
Published: (2026)
by: Han, Chengrui, et al.
Published: (2026)
Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality
by: Li, Junliang, et al.
Published: (2025)
by: Li, Junliang, et al.
Published: (2025)
SHA256 at SemEval-2025 Task 4: Selective Amnesia -- Constrained Unlearning for Large Language Models via Knowledge Isolation
by: Agrawal, Saransh, et al.
Published: (2025)
by: Agrawal, Saransh, et al.
Published: (2025)
R1-ACT: Efficient Reasoning Model Safety Alignment by Activating Safety Knowledge
by: In, Yeonjun, et al.
Published: (2025)
by: In, Yeonjun, et al.
Published: (2025)
Knowledge Fusion of Large Language Models Via Modular SkillPacks
by: Du, Guodong, et al.
Published: (2025)
by: Du, Guodong, et al.
Published: (2025)
Constrain Alignment with Sparse Autoencoders
by: Yin, Qingyu, et al.
Published: (2024)
by: Yin, Qingyu, et al.
Published: (2024)
Bridging the Gap Between Preference Alignment and Machine Unlearning
by: Feng, Xiaohua, et al.
Published: (2025)
by: Feng, Xiaohua, et al.
Published: (2025)
Learning and Unlearning of Fabricated Knowledge in Language Models
by: Sun, Chen, et al.
Published: (2024)
by: Sun, Chen, et al.
Published: (2024)
Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance
by: Hu, Wenbin, et al.
Published: (2025)
by: Hu, Wenbin, et al.
Published: (2025)
Multilingual Safety Alignment via Self-Distillation
by: Qin, Ruiyang, et al.
Published: (2026)
by: Qin, Ruiyang, et al.
Published: (2026)
Intrinsic Test of Unlearning Using Parametric Knowledge Traces
by: Hong, Yihuai, et al.
Published: (2024)
by: Hong, Yihuai, et al.
Published: (2024)
MKRAG: Medical Knowledge Retrieval Augmented Generation for Medical Question Answering
by: Shi, Yucheng, et al.
Published: (2023)
by: Shi, Yucheng, et al.
Published: (2023)
ZeroUnlearn: Few-Shot Knowledge Unlearning in Large Language Models
by: Lin, Yujie, et al.
Published: (2026)
by: Lin, Yujie, et al.
Published: (2026)
Circumventing Safety Alignment in Large Language Models Through Embedding Space Toxicity Attenuation
by: Zhang, Zhibo, et al.
Published: (2025)
by: Zhang, Zhibo, et al.
Published: (2025)
Enhancing Large Language Model for Knowledge Graph Completion via Structure-Aware Alignment-Tuning
by: Liu, Yu, et al.
Published: (2025)
by: Liu, Yu, et al.
Published: (2025)
Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering
by: Shi, Yucheng, et al.
Published: (2024)
by: Shi, Yucheng, et al.
Published: (2024)
Safety Is Not Universal: The Selective Safety Trap in LLM Alignment
by: Brito, Iago Alves, et al.
Published: (2026)
by: Brito, Iago Alves, et al.
Published: (2026)
STAR-S: Improving Safety Alignment through Self-Taught Reasoning on Safety Rules
by: Wu, Di, et al.
Published: (2026)
by: Wu, Di, et al.
Published: (2026)
Cat-DPO: Category-Adaptive Safety Alignment
by: Yang, Tiankai, et al.
Published: (2026)
by: Yang, Tiankai, et al.
Published: (2026)
Efficient Knowledge Infusion via KG-LLM Alignment
by: Jiang, Zhouyu, et al.
Published: (2024)
by: Jiang, Zhouyu, et al.
Published: (2024)
LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models
by: Veldanda, Akshaj Kumar, et al.
Published: (2024)
by: Veldanda, Akshaj Kumar, et al.
Published: (2024)
Catastrophic Failure of LLM Unlearning via Quantization
by: Zhang, Zhiwei, et al.
Published: (2024)
by: Zhang, Zhiwei, et al.
Published: (2024)
SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment
by: Li, Hao, et al.
Published: (2026)
by: Li, Hao, et al.
Published: (2026)
Adapting LLMs to Time Series Forecasting via Temporal Heterogeneity Modeling and Semantic Alignment
by: Sun, Yanru, et al.
Published: (2025)
by: Sun, Yanru, et al.
Published: (2025)
What Matters For Safety Alignment?
by: Li, Xing, et al.
Published: (2026)
by: Li, Xing, et al.
Published: (2026)
Does Machine Unlearning Truly Remove Knowledge?
by: Chen, Haokun, et al.
Published: (2025)
by: Chen, Haokun, et al.
Published: (2025)
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
by: Zhang, Jingyu, et al.
Published: (2024)
by: Zhang, Jingyu, et al.
Published: (2024)
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models
by: Fang, Junfeng, et al.
Published: (2024)
by: Fang, Junfeng, et al.
Published: (2024)
Two Heads Are Better Than One: Integrating Knowledge from Knowledge Graphs and Large Language Models for Entity Alignment
by: Yang, Linyao, et al.
Published: (2024)
by: Yang, Linyao, et al.
Published: (2024)
Psychometric Alignment: Capturing Human Knowledge Distributions via Language Models
by: He-Yueya, Joy, et al.
Published: (2024)
by: He-Yueya, Joy, et al.
Published: (2024)
SafeWorld: Geo-Diverse Safety Alignment
by: Yin, Da, et al.
Published: (2024)
by: Yin, Da, et al.
Published: (2024)
HOMURA: Taming the Sand-Glass for Time-Constrained LLM Translation via Reinforcement Learning
by: Cui, Ziang, et al.
Published: (2026)
by: Cui, Ziang, et al.
Published: (2026)
MESA: Improving MoE Safety Alignment via Decentralized Expertise
by: Sun, Yitong, et al.
Published: (2026)
by: Sun, Yitong, et al.
Published: (2026)
Unforgotten Safety: Preserving Safety Alignment of Large Language Models with Continual Learning
by: Alssum, Lama, et al.
Published: (2025)
by: Alssum, Lama, et al.
Published: (2025)
Self-Guided Defense: Adaptive Safety Alignment for Reasoning Models via Synthesized Guidelines
by: Wang, Yuhang, et al.
Published: (2025)
by: Wang, Yuhang, et al.
Published: (2025)
Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation
by: Sun, Chenkai, et al.
Published: (2025)
by: Sun, Chenkai, et al.
Published: (2025)
The Hidden Dimensions of LLM Alignment: A Multi-Dimensional Analysis of Orthogonal Safety Directions
by: Pan, Wenbo, et al.
Published: (2025)
by: Pan, Wenbo, et al.
Published: (2025)
PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference
by: Ji, Jiaming, et al.
Published: (2024)
by: Ji, Jiaming, et al.
Published: (2024)
An Adversarial Perspective on Machine Unlearning for AI Safety
by: Łucki, Jakub, et al.
Published: (2024)
by: Łucki, Jakub, et al.
Published: (2024)
Similar Items
-
Team-Based Self-Play With Dual Adaptive Weighting for Fine-Tuning LLMs
by: Li, Wu, et al.
Published: (2026) -
RAGA: Reading-And-Graph-building-Agent for Autonomous Knowledge Graph Construction and Retrieval-Augmented Generation
by: Han, Chengrui, et al.
Published: (2026) -
Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality
by: Li, Junliang, et al.
Published: (2025) -
SHA256 at SemEval-2025 Task 4: Selective Amnesia -- Constrained Unlearning for Large Language Models via Knowledge Isolation
by: Agrawal, Saransh, et al.
Published: (2025) -
R1-ACT: Efficient Reasoning Model Safety Alignment by Activating Safety Knowledge
by: In, Yeonjun, et al.
Published: (2025)