:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Haoyu, Qin, Zeyu, Shen, Li, Wang, Xueqian, Tao, Dacheng, Cheng, Minhao
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2502.04040
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Lifelong Safety Alignment for Language Models
by: Wang, Haoyu, et al.
Published: (2025)

Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks
by: Liu, Haoyu, et al.
Published: (2026)

Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
by: Kong, Yilun, et al.
Published: (2025)

Multilingual Safety Alignment via Self-Distillation
by: Qin, Ruiyang, et al.
Published: (2026)

Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense
by: Min, Rui, et al.
Published: (2024)

JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models
by: Chen, Michael K., et al.
Published: (2025)

Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT?
by: Sun, Yiyou, et al.
Published: (2025)

Contextual Drag: How Errors in the Context Affect LLM Reasoning
by: Cheng, Yun, et al.
Published: (2026)

RHYTHM: Reasoning with Hierarchical Temporal Tokenization for Human Mobility
by: He, Haoyu, et al.
Published: (2025)

FusionBench: A Unified Library and Comprehensive Benchmark for Deep Model Fusion
by: Tang, Anke, et al.
Published: (2024)

Scalable Token-Level Hallucination Detection in Large Language Models
by: Min, Rui, et al.
Published: (2026)

Improving Large Language Models with Concept-Aware Fine-Tuning
by: Chen, Michael K., et al.
Published: (2025)

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
by: Yang, Enneng, et al.
Published: (2024)

PISanitizer: Preventing Prompt Injection to Long-Context LLMs via Prompt Sanitization
by: Geng, Runpeng, et al.
Published: (2025)

Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning
by: Liu, Yong, et al.
Published: (2024)

Beyond Speedup -- Utilizing KV Cache for Sampling and Reasoning
by: Xing, Zeyu, et al.
Published: (2026)

Struc-EMB: The Potential of Structure-Aware Encoding in Language Embeddings
by: Liu, Shikun, et al.
Published: (2025)

Graphical Reasoning: LLM-based Semi-Open Relation Extraction
by: Tao, Yicheng, et al.
Published: (2024)

Steering Large Reasoning Models towards Concise Reasoning via Flow Matching
by: Li, Yawei, et al.
Published: (2026)

One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts
by: Wang, Ruochen, et al.
Published: (2024)

Efficient Reasoning with Hidden Thinking
by: Shen, Xuan, et al.
Published: (2025)

Skywork Open Reasoner 1 Technical Report
by: He, Jujie, et al.
Published: (2025)

SSR: Socratic Self-Refine for Large Language Model Reasoning
by: Shi, Haizhou, et al.
Published: (2025)

Mitigating Hallucinations in Large Language Models via Causal Reasoning
by: Li, Yuangang, et al.
Published: (2025)

LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety
by: Yang, Junxiao, et al.
Published: (2026)

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information
by: Shen, Guobin, et al.
Published: (2026)

Towards Understanding Safety Alignment: A Mechanistic Perspective from Safety Neurons
by: Chen, Jianhui, et al.
Published: (2024)

Supervised Fine-Tuning Needs to Unlock the Potential of Token Priority
by: Shen, Zhanming, et al.
Published: (2026)

RM-R1: Reward Modeling as Reasoning
by: Chen, Xiusi, et al.
Published: (2025)

What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding
by: Li, Ming, et al.
Published: (2025)

Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
by: Yuan, Yurun, et al.
Published: (2025)

Reinforcement Learning for Reasoning in Large Language Models with One Training Example
by: Wang, Yiping, et al.
Published: (2025)

Exposing LLM Safety Gaps Through Mathematical Encoding:New Attacks and Systematic Analysis
by: Zhang, Haoyu, et al.
Published: (2026)

DB-LLM: Accurate Dual-Binarization for Efficient LLMs
by: Chen, Hong, et al.
Published: (2024)

LongSafety: Enhance Safety for Long-Context LLMs
by: Huang, Mianqiu, et al.
Published: (2024)

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
by: Zhang, Jingyi, et al.
Published: (2025)

Learning to Reason under Off-Policy Guidance
by: Yan, Jianhao, et al.
Published: (2025)

SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs
by: Siu, Vincent, et al.
Published: (2025)

ExGRPO: Learning to Reason from Experience
by: Zhan, Runzhe, et al.
Published: (2025)

Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting
by: Wang, Cheng, et al.
Published: (2026)