Saved in:
| Main Authors: | Xie, Yingsha, Huang, Tiansheng, Yang, Enneng, Min, Rui, Lu, Wenjie, Cao, Xiaochun, Tan, Naiqiang, Shen, Li |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.02136 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning Perturbation
by: Wang, Yibo, et al.
Published: (2025)
by: Wang, Yibo, et al.
Published: (2025)
Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable
by: Huang, Tiansheng, et al.
Published: (2025)
by: Huang, Tiansheng, et al.
Published: (2025)
R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search
by: Wang, Yibo, et al.
Published: (2025)
by: Wang, Yibo, et al.
Published: (2025)
Ada-R1: Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
by: Luo, Haotian, et al.
Published: (2025)
by: Luo, Haotian, et al.
Published: (2025)
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
by: Luo, Haotian, et al.
Published: (2025)
by: Luo, Haotian, et al.
Published: (2025)
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
by: Yang, Enneng, et al.
Published: (2024)
by: Yang, Enneng, et al.
Published: (2024)
OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging
by: Wei, Yongxian, et al.
Published: (2025)
by: Wei, Yongxian, et al.
Published: (2025)
Surgery: Mitigating Harmful Fine-Tuning for Large Language Models via Attention Sink
by: Liu, Guozhi, et al.
Published: (2026)
by: Liu, Guozhi, et al.
Published: (2026)
Targeted Vaccine: Safety Alignment for Large Language Models against Harmful Fine-Tuning via Layer-wise Perturbation
by: Liu, Guozhi, et al.
Published: (2024)
by: Liu, Guozhi, et al.
Published: (2024)
Bag of Tricks for Inference-time Computation of LLM Reasoning
by: Liu, Fan, et al.
Published: (2025)
by: Liu, Fan, et al.
Published: (2025)
Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning
by: Ning, Yansong, et al.
Published: (2025)
by: Ning, Yansong, et al.
Published: (2025)
SafeThinker: Reasoning about Risk to Deepen Safety Beyond Shallow Alignment
by: Fang, Xianya, et al.
Published: (2026)
by: Fang, Xianya, et al.
Published: (2026)
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging
by: Shen, Li, et al.
Published: (2024)
by: Shen, Li, et al.
Published: (2024)
Model Unmerging: Making Your Models Unmergeable for Secure Model Sharing
by: Wang, Zihao, et al.
Published: (2025)
by: Wang, Zihao, et al.
Published: (2025)
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization
by: Wu, Junyan, et al.
Published: (2024)
by: Wu, Junyan, et al.
Published: (2024)
UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios
by: Luo, Haotian, et al.
Published: (2025)
by: Luo, Haotian, et al.
Published: (2025)
Distributionally Robust Graph Out-of-Distribution Recommendation via Diffusion Model
by: Zhao, Chu, et al.
Published: (2025)
by: Zhao, Chu, et al.
Published: (2025)
Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models
by: Xu, Yue, et al.
Published: (2025)
by: Xu, Yue, et al.
Published: (2025)
When Models Outthink Their Safety: Unveiling and Mitigating Self-Jailbreak in Large Reasoning Models
by: Mao, Yingzhi, et al.
Published: (2025)
by: Mao, Yingzhi, et al.
Published: (2025)
Hard Negative Sampling via Large Language Models for Recommendation
by: Zhao, Chu, et al.
Published: (2025)
by: Zhao, Chu, et al.
Published: (2025)
Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning
by: Liu, Guozhi, et al.
Published: (2025)
by: Liu, Guozhi, et al.
Published: (2025)
A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning
by: Wang, Zhenyi, et al.
Published: (2023)
by: Wang, Zhenyi, et al.
Published: (2023)
Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning
by: Luo, Qin-Wen, et al.
Published: (2026)
by: Luo, Qin-Wen, et al.
Published: (2026)
Beyond the Safety Tax: Mitigating Unsafe Text-to-Image Generation via External Safety Rectification
by: Meng, Xiangtao, et al.
Published: (2025)
by: Meng, Xiangtao, et al.
Published: (2025)
Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning
by: Huang, Tiansheng, et al.
Published: (2024)
by: Huang, Tiansheng, et al.
Published: (2024)
Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model
by: Lou, Xinyue, et al.
Published: (2025)
by: Lou, Xinyue, et al.
Published: (2025)
Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack
by: Huang, Tiansheng, et al.
Published: (2024)
by: Huang, Tiansheng, et al.
Published: (2024)
MeasHalu: Mitigation of Scientific Measurement Hallucinations for Large Language Models with Enhanced Reasoning
by: Huang, Ruijun, et al.
Published: (2026)
by: Huang, Ruijun, et al.
Published: (2026)
Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
by: Lu, Keming, et al.
Published: (2024)
by: Lu, Keming, et al.
Published: (2024)
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement
by: Xi, Zhiheng, et al.
Published: (2023)
by: Xi, Zhiheng, et al.
Published: (2023)
Long Is More Important Than Difficult for Training Reasoning Models
by: Shen, Si, et al.
Published: (2025)
by: Shen, Si, et al.
Published: (2025)
Kestrel: Grounding Self-Refinement for LVLM Hallucination Mitigation
by: Mao, Jiawei, et al.
Published: (2026)
by: Mao, Jiawei, et al.
Published: (2026)
Rebellion: Noise-Robust Reasoning Training for Audio Reasoning Models
by: Huang, Tiansheng, et al.
Published: (2025)
by: Huang, Tiansheng, et al.
Published: (2025)
Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection
by: Sun, Guanglong, et al.
Published: (2026)
by: Sun, Guanglong, et al.
Published: (2026)
Causal Direct Preference Optimization for Distributionally Robust Generative Recommendation
by: Zhao, Chu, et al.
Published: (2026)
by: Zhao, Chu, et al.
Published: (2026)
Automatic Pruning Discovery for Large Language Models
by: Kang, Haidong, et al.
Published: (2025)
by: Kang, Haidong, et al.
Published: (2025)
Chain of Risk: Safety Failures in Large Reasoning Models and Mitigation via Adaptive Multi-Principle Steering
by: Li, Xiaomin, et al.
Published: (2026)
by: Li, Xiaomin, et al.
Published: (2026)
PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language Models
by: Hu, Sihao, et al.
Published: (2024)
by: Hu, Sihao, et al.
Published: (2024)
Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense
by: Min, Rui, et al.
Published: (2024)
by: Min, Rui, et al.
Published: (2024)
Systematic Engineering of Escherichia coli to Enhance 1,6‐Hexamethylenediamine Biosynthesis and Mitigate Byproduct 1,5‐Pentanediamine
by: Zanwen Chen, et al.
Published: (2026)
by: Zanwen Chen, et al.
Published: (2026)
Similar Items
-
Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning Perturbation
by: Wang, Yibo, et al.
Published: (2025) -
Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable
by: Huang, Tiansheng, et al.
Published: (2025) -
R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search
by: Wang, Yibo, et al.
Published: (2025) -
Ada-R1: Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
by: Luo, Haotian, et al.
Published: (2025) -
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
by: Luo, Haotian, et al.
Published: (2025)