Saved in:
| Main Authors: | Wang, Haoyu, Qin, Zeyu, Shen, Li, Wang, Xueqian, Tao, Dacheng, Cheng, Minhao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.04040 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Lifelong Safety Alignment for Language Models
by: Wang, Haoyu, et al.
Published: (2025)
by: Wang, Haoyu, et al.
Published: (2025)
Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks
by: Liu, Haoyu, et al.
Published: (2026)
by: Liu, Haoyu, et al.
Published: (2026)
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
by: Kong, Yilun, et al.
Published: (2025)
by: Kong, Yilun, et al.
Published: (2025)
Multilingual Safety Alignment via Self-Distillation
by: Qin, Ruiyang, et al.
Published: (2026)
by: Qin, Ruiyang, et al.
Published: (2026)
Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense
by: Min, Rui, et al.
Published: (2024)
by: Min, Rui, et al.
Published: (2024)
JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models
by: Chen, Michael K., et al.
Published: (2025)
by: Chen, Michael K., et al.
Published: (2025)
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
by: Sun, Yiyou, et al.
Published: (2025)
by: Sun, Yiyou, et al.
Published: (2025)
Contextual Drag: How Errors in the Context Affect LLM Reasoning
by: Cheng, Yun, et al.
Published: (2026)
by: Cheng, Yun, et al.
Published: (2026)
RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
by: He, Haoyu, et al.
Published: (2025)
by: He, Haoyu, et al.
Published: (2025)
FusionBench: A Unified Library and Comprehensive Benchmark for Deep Model Fusion
by: Tang, Anke, et al.
Published: (2024)
by: Tang, Anke, et al.
Published: (2024)
Scalable Token-Level Hallucination Detection in Large Language Models
by: Min, Rui, et al.
Published: (2026)
by: Min, Rui, et al.
Published: (2026)
Improving Large Language Models with Concept-Aware Fine-Tuning
by: Chen, Michael K., et al.
Published: (2025)
by: Chen, Michael K., et al.
Published: (2025)
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
by: Yang, Enneng, et al.
Published: (2024)
by: Yang, Enneng, et al.
Published: (2024)
PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization
by: Geng, Runpeng, et al.
Published: (2025)
by: Geng, Runpeng, et al.
Published: (2025)
Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning
by: Liu, Yong, et al.
Published: (2024)
by: Liu, Yong, et al.
Published: (2024)
Beyond Speedup -- Utilizing KV Cache for Sampling and Reasoning
by: Xing, Zeyu, et al.
Published: (2026)
by: Xing, Zeyu, et al.
Published: (2026)
Struc-EMB: The Potential of Structure-Aware Encoding in Language Embeddings
by: Liu, Shikun, et al.
Published: (2025)
by: Liu, Shikun, et al.
Published: (2025)
Graphical Reasoning: LLM-based Semi-Open Relation Extraction
by: Tao, Yicheng, et al.
Published: (2024)
by: Tao, Yicheng, et al.
Published: (2024)
Steering Large Reasoning Models towards Concise Reasoning via Flow Matching
by: Li, Yawei, et al.
Published: (2026)
by: Li, Yawei, et al.
Published: (2026)
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts
by: Wang, Ruochen, et al.
Published: (2024)
by: Wang, Ruochen, et al.
Published: (2024)
Efficient Reasoning with Hidden Thinking
by: Shen, Xuan, et al.
Published: (2025)
by: Shen, Xuan, et al.
Published: (2025)
Skywork Open Reasoner 1 Technical Report
by: He, Jujie, et al.
Published: (2025)
by: He, Jujie, et al.
Published: (2025)
SSR: Socratic Self-Refine for Large Language Model Reasoning
by: Shi, Haizhou, et al.
Published: (2025)
by: Shi, Haizhou, et al.
Published: (2025)
Mitigating Hallucinations in Large Language Models via Causal Reasoning
by: Li, Yuangang, et al.
Published: (2025)
by: Li, Yuangang, et al.
Published: (2025)
LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety
by: Yang, Junxiao, et al.
Published: (2026)
by: Yang, Junxiao, et al.
Published: (2026)
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information
by: Shen, Guobin, et al.
Published: (2026)
by: Shen, Guobin, et al.
Published: (2026)
Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons
by: Chen, Jianhui, et al.
Published: (2024)
by: Chen, Jianhui, et al.
Published: (2024)
Supervised Fine-Tuning Needs to Unlock the Potential of Token Priority
by: Shen, Zhanming, et al.
Published: (2026)
by: Shen, Zhanming, et al.
Published: (2026)
RM-R1: Reward Modeling as Reasoning
by: Chen, Xiusi, et al.
Published: (2025)
by: Chen, Xiusi, et al.
Published: (2025)
What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding
by: Li, Ming, et al.
Published: (2025)
by: Li, Ming, et al.
Published: (2025)
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
by: Yuan, Yurun, et al.
Published: (2025)
by: Yuan, Yurun, et al.
Published: (2025)
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
by: Wang, Yiping, et al.
Published: (2025)
by: Wang, Yiping, et al.
Published: (2025)
Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis
by: Zhang, Haoyu, et al.
Published: (2026)
by: Zhang, Haoyu, et al.
Published: (2026)
DB-LLM: Accurate Dual-Binarization for Efficient LLMs
by: Chen, Hong, et al.
Published: (2024)
by: Chen, Hong, et al.
Published: (2024)
LongSafety: Enhance Safety for Long-Context LLMs
by: Huang, Mianqiu, et al.
Published: (2024)
by: Huang, Mianqiu, et al.
Published: (2024)
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
by: Zhang, Jingyi, et al.
Published: (2025)
by: Zhang, Jingyi, et al.
Published: (2025)
Learning to Reason under Off-Policy Guidance
by: Yan, Jianhao, et al.
Published: (2025)
by: Yan, Jianhao, et al.
Published: (2025)
SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs
by: Siu, Vincent, et al.
Published: (2025)
by: Siu, Vincent, et al.
Published: (2025)
ExGRPO: Learning to Reason from Experience
by: Zhan, Runzhe, et al.
Published: (2025)
by: Zhan, Runzhe, et al.
Published: (2025)
Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting
by: Wang, Cheng, et al.
Published: (2026)
by: Wang, Cheng, et al.
Published: (2026)
Similar Items
-
Lifelong Safety Alignment for Language Models
by: Wang, Haoyu, et al.
Published: (2025) -
Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks
by: Liu, Haoyu, et al.
Published: (2026) -
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
by: Kong, Yilun, et al.
Published: (2025) -
Multilingual Safety Alignment via Self-Distillation
by: Qin, Ruiyang, et al.
Published: (2026) -
Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense
by: Min, Rui, et al.
Published: (2024)