:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Hongming, Lin, Shi, Shao, Jun, Lin, Changting, Zhu, Donghai, Han, Meng, Kong, Qinglei
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2506.06401
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models
by: Lin, Shi, et al.
Published: (2024)

NeuRel-Attack: Neuron Relearning for Safety Disalignment in Large Language Models
by: Zhou, Yi, et al.
Published: (2025)

Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
by: Li, Ziniu, et al.
Published: (2025)

Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
by: Wang, Haoxiang, et al.
Published: (2024)

StepFun-Formalizer: Unlocking the Autoformalization Potential of LLMs through Knowledge-Reasoning Fusion
by: Wu, Yutong, et al.
Published: (2025)

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
by: Dang, John, et al.
Published: (2024)

MEUV: Achieving Fine-Grained Capability Activation in Large Language Models via Mutually Exclusive Unlock Vectors
by: Tong, Xin, et al.
Published: (2025)

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
by: Xiaomi, LLM-Core, et al.
Published: (2025)

Orthogonal Finetuning for Direct Preference Optimization
by: Yang, Chenxu, et al.
Published: (2024)

Unlocking Multimodal Mathematical Reasoning via Process Reward Model
by: Luo, Ruilin, et al.
Published: (2025)

Supervised Fine-Tuning Needs to Unlock the Potential of Token Priority
by: Shen, Zhanming, et al.
Published: (2026)

Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training
by: Liu, Mingjie, et al.
Published: (2025)

Unlocking Reasoning Capabilities in LLMs via Reinforcement Learning Exploration
by: Deng, Wenhao, et al.
Published: (2025)

Self-Guided Process Reward Optimization with Redefined Step-wise Advantage for Process Reinforcement Learning
by: Fei, Wu, et al.
Published: (2025)

Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
by: Ji, Kaixuan, et al.
Published: (2024)

Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
by: Shen, Xuan, et al.
Published: (2023)

EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents
by: Zala, Abhay, et al.
Published: (2024)

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
by: Zheng, Chujie, et al.
Published: (2025)

A Lightweight LLM Framework for Disaster Humanitarian Information Classification
by: Jinzhen, Han, et al.
Published: (2026)

Revealing Behavioral Plasticity in Large Language Models: A Token-Conditional Perspective
by: Mao, Liyuan, et al.
Published: (2026)

PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching
by: Chen, Ruishuo, et al.
Published: (2026)

Stability as a Liability:Systematic Breakdown of Linguistic Structure in LLMs
by: Meng, Xianzhe, et al.
Published: (2026)

Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach
by: Zhang, Xinnan, et al.
Published: (2025)

Revisiting Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning
by: Zhang, Xiaoyun, et al.
Published: (2025)

MANATEE: Inference-Time Lightweight Diffusion Based Safety Defense for LLMs
by: Kan, Chun Yan Ryan, et al.
Published: (2026)

A Framework to Implement 1+N Multi-task Fine-tuning Pattern in LLMs Using the CGC-LORA Algorithm
by: Song, Chao, et al.
Published: (2024)

Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
by: Xiao, Bin, et al.
Published: (2024)

Unlocking Public Catalogues: Instruction-Tuning LLMs for ICD Coding of German Tumor Diagnoses
by: Lenz, Stefan, et al.
Published: (2025)

Shared Lexical Task Representations Explain Behavioral Variability In LLMs
by: Yang, Zhuonan, et al.
Published: (2026)

A Case Study of Selected PTQ Baselines for Reasoning LLMs on Ascend NPU
by: Luo, Yuchen, et al.
Published: (2026)

GEM: A Gym for Agentic LLMs
by: Liu, Zichen, et al.
Published: (2025)

Unlocking the Potential of Continual Model Merging: An ODE Perspective
by: Lin, Lihong, et al.
Published: (2026)

CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
by: Lin, Zicheng, et al.
Published: (2024)

PRISM: A Geometric Risk Bound that Decomposes Drift into Scale, Shape, and Head
by: Lin, Chieh-Yen, et al.
Published: (2026)

Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
by: Gu, Naibin, et al.
Published: (2025)

Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation
by: Bai, Fengshuo, et al.
Published: (2024)

TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs
by: Xie, Yutao, et al.
Published: (2026)

Linear Model Merging Unlocks Simple and Scalable Multimodal Data Mixture Optimization
by: Berasi, Davide, et al.
Published: (2026)

Aligning CodeLLMs with Direct Preference Optimization
by: Miao, Yibo, et al.
Published: (2024)

MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary Negative Samples
by: Xie, Shuo, et al.
Published: (2024)