Saved in:
| Main Authors: | Yu, Linhao, Yang, Tianmeng, Ding, Siyu, Jin, Renren, Gu, Naibin, Hao, Xiangzhao, Nie, Shuaiyi, Xiong, Deyi, Yin, Weichong, Sun, Yu, Wu, Hua |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.12627 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ATTNPO: Attention-Guided Process Supervision for Efficient Reasoning
by: Nie, Shuaiyi, et al.
Published: (2026)
by: Nie, Shuaiyi, et al.
Published: (2026)
KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality
by: Ren, Baochang, et al.
Published: (2025)
by: Ren, Baochang, et al.
Published: (2025)
KnowRL: Teaching Language Models to Know What They Know
by: Kale, Sahil, et al.
Published: (2025)
by: Kale, Sahil, et al.
Published: (2025)
A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL
by: Yang, Lei, et al.
Published: (2026)
by: Yang, Lei, et al.
Published: (2026)
Extending RLVR to Open-Ended Tasks via Verifiable Multiple-Choice Reformulation
by: Zhang, Mengyu, et al.
Published: (2025)
by: Zhang, Mengyu, et al.
Published: (2025)
LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models
by: Liu, Chuang, et al.
Published: (2024)
by: Liu, Chuang, et al.
Published: (2024)
LFED: A Literary Fiction Evaluation Dataset for Large Language Models
by: Yu, Linhao, et al.
Published: (2024)
by: Yu, Linhao, et al.
Published: (2024)
SOUP: Token-level Single-sample Mix-policy Reinforcement Learning for Large Language Models
by: Yang, Lei, et al.
Published: (2026)
by: Yang, Lei, et al.
Published: (2026)
Self-Pluralising Culture Alignment for Large Language Models
by: Xu, Shaoyang, et al.
Published: (2024)
by: Xu, Shaoyang, et al.
Published: (2024)
Do Large Language Models Mirror Cognitive Language Processing?
by: Ren, Yuqi, et al.
Published: (2024)
by: Ren, Yuqi, et al.
Published: (2024)
Orthogonal Finetuning for Direct Preference Optimization
by: Yang, Chenxu, et al.
Published: (2024)
by: Yang, Chenxu, et al.
Published: (2024)
Revisiting Entropy in Reinforcement Learning for Large Reasoning Models
by: Jin, Renren, et al.
Published: (2025)
by: Jin, Renren, et al.
Published: (2025)
IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons
by: Shi, Dan, et al.
Published: (2024)
by: Shi, Dan, et al.
Published: (2024)
Weights-Rotated Preference Optimization for Large Language Models
by: Yang, Chenxu, et al.
Published: (2025)
by: Yang, Chenxu, et al.
Published: (2025)
Why Does Reinforcement Learning Generalize? A Feature-Level Mechanistic Study of Post-Training in Large Language Models
by: Shi, Dan, et al.
Published: (2026)
by: Shi, Dan, et al.
Published: (2026)
FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models
by: Liu, Yan, et al.
Published: (2024)
by: Liu, Yan, et al.
Published: (2024)
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
by: Dong, Weilong, et al.
Published: (2024)
by: Dong, Weilong, et al.
Published: (2024)
Meta-RTL: Reinforcement-Based Meta-Transfer Learning for Low-Resource Commonsense Reasoning
by: Fu, Yu, et al.
Published: (2024)
by: Fu, Yu, et al.
Published: (2024)
ProBench: Benchmarking Large Language Models in Competitive Programming
by: Yang, Lei, et al.
Published: (2025)
by: Yang, Lei, et al.
Published: (2025)
Joint Training And Decoding for Multilingual End-to-End Simultaneous Speech Translation
by: Huang, Wuwei, et al.
Published: (2025)
by: Huang, Wuwei, et al.
Published: (2025)
Know What You Know: Metacognitive Entropy Calibration for Verifiable RL Reasoning
by: Zhao, Qiannian, et al.
Published: (2026)
by: Zhao, Qiannian, et al.
Published: (2026)
Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance
by: Yan, Kai, et al.
Published: (2026)
by: Yan, Kai, et al.
Published: (2026)
TRiMS: Real-Time Tracking of Minimal Sufficient Length for Efficient Reasoning via RL
by: Bian, Tingcheng, et al.
Published: (2026)
by: Bian, Tingcheng, et al.
Published: (2026)
Large Language Model Safety: A Holistic Survey
by: Shi, Dan, et al.
Published: (2024)
by: Shi, Dan, et al.
Published: (2024)
Pursuing Minimal Sufficiency in Spatial Reasoning
by: Guo, Yejie, et al.
Published: (2025)
by: Guo, Yejie, et al.
Published: (2025)
A Comprehensive Evaluation of Quantization Strategies for Large Language Models
by: Jin, Renren, et al.
Published: (2024)
by: Jin, Renren, et al.
Published: (2024)
Exploring the System 1 Thinking Capability of Large Reasoning Models
by: Zhang, Wenyuan, et al.
Published: (2025)
by: Zhang, Wenyuan, et al.
Published: (2025)
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety
by: Liu, Chuang, et al.
Published: (2024)
by: Liu, Chuang, et al.
Published: (2024)
Think Less, Know More: State-Aware Reasoning Compression with Knowledge Guidance for Efficient Reasoning
by: Sui, Yi, et al.
Published: (2026)
by: Sui, Yi, et al.
Published: (2026)
XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search
by: Zhang, Yiting, et al.
Published: (2025)
by: Zhang, Yiting, et al.
Published: (2025)
Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models
by: Liu, Xiyu, et al.
Published: (2024)
by: Liu, Xiyu, et al.
Published: (2024)
Causal Path Alignment: Anchoring the Optimization Trajectory for Controllable In-Parameter Knowledge Editing
by: Liu, Xiyu, et al.
Published: (2025)
by: Liu, Xiyu, et al.
Published: (2025)
KnowDiffuser: A Knowledge-Guided Diffusion Planner with LLM Reasoning
by: Ding, Fan, et al.
Published: (2026)
by: Ding, Fan, et al.
Published: (2026)
CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models
by: Hao, Xiangzhao, et al.
Published: (2026)
by: Hao, Xiangzhao, et al.
Published: (2026)
Interplay Between Single-Photon Ionization and the Auger Process in Argon Ion Formation
by: Xiong, Linhao
Published: (2024)
by: Xiong, Linhao
Published: (2024)
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning
by: Zhou, Hang, et al.
Published: (2024)
by: Zhou, Hang, et al.
Published: (2024)
Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy
by: Fu, Yu, et al.
Published: (2023)
by: Fu, Yu, et al.
Published: (2023)
SuperRL: Reinforcement Learning with Supervision to Boost Language Model Reasoning
by: Liu, Yihao, et al.
Published: (2025)
by: Liu, Yihao, et al.
Published: (2025)
PA2D-MORL: Pareto Ascent Directional Decomposition based Multi-Objective Reinforcement Learning
by: Hu, Tianmeng, et al.
Published: (2026)
by: Hu, Tianmeng, et al.
Published: (2026)
Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning
by: Deng, Huilin, et al.
Published: (2025)
by: Deng, Huilin, et al.
Published: (2025)
Similar Items
-
ATTNPO: Attention-Guided Process Supervision for Efficient Reasoning
by: Nie, Shuaiyi, et al.
Published: (2026) -
KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality
by: Ren, Baochang, et al.
Published: (2025) -
KnowRL: Teaching Language Models to Know What They Know
by: Kale, Sahil, et al.
Published: (2025) -
A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL
by: Yang, Lei, et al.
Published: (2026) -
Extending RLVR to Open-Ended Tasks via Verifiable Multiple-Choice Reformulation
by: Zhang, Mengyu, et al.
Published: (2025)