Guardado en:
| Autores principales: | Wang, Chen, Deng, Hexuan, Zhang, Yining, Zhang, Yuchen, Bai, Jionghao, Li, Zhaochun, Lan, Ge, Wang, Yue |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2605.07316 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
SCOPE-RL: Stable and Quantitative Control of Policy Entropy in RL Post-Training
por: Wang, Chen, et al.
Publicado: (2025)
por: Wang, Chen, et al.
Publicado: (2025)
Distribution-Centric Policy Optimization Dominates Exploration-Exploitation Trade-off
por: Li, Zhaochun, et al.
Publicado: (2026)
por: Li, Zhaochun, et al.
Publicado: (2026)
REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning
por: Deng, Hexuan, et al.
Publicado: (2025)
por: Deng, Hexuan, et al.
Publicado: (2025)
Information-Theoretic Distributed Point Functions with Shorter Keys
por: Deng, Hang, et al.
Publicado: (2026)
por: Deng, Hang, et al.
Publicado: (2026)
ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models
por: Dumitru, Razvan-Gabriel, et al.
Publicado: (2025)
por: Dumitru, Razvan-Gabriel, et al.
Publicado: (2025)
Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression
por: Zeng, Lingjie, et al.
Publicado: (2026)
por: Zeng, Lingjie, et al.
Publicado: (2026)
Shorter Thoughts, Same Answers: Difficulty-Scaled Segment-Wise RL for CoT Compression
por: Tian, Ye, et al.
Publicado: (2026)
por: Tian, Ye, et al.
Publicado: (2026)
First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training
por: Wei, Lai, et al.
Publicado: (2025)
por: Wei, Lai, et al.
Publicado: (2025)
Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR
por: Bounhar, Abdelaziz, et al.
Publicado: (2025)
por: Bounhar, Abdelaziz, et al.
Publicado: (2025)
JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training
por: Hu, Zhengding, et al.
Publicado: (2026)
por: Hu, Zhengding, et al.
Publicado: (2026)
MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation
por: Xie, Shuzhao, et al.
Publicado: (2024)
por: Xie, Shuzhao, et al.
Publicado: (2024)
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
por: Zhang, Charlie, et al.
Publicado: (2025)
por: Zhang, Charlie, et al.
Publicado: (2025)
HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization
por: Huang, Chengyu, et al.
Publicado: (2025)
por: Huang, Chengyu, et al.
Publicado: (2025)
AsyncFlow: An Asynchronous Streaming RL Framework for Efficient LLM Post-Training
por: Han, Zhenyu, et al.
Publicado: (2025)
por: Han, Zhenyu, et al.
Publicado: (2025)
ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation
por: Tang, Siao, et al.
Publicado: (2025)
por: Tang, Siao, et al.
Publicado: (2025)
Enhancing RAG Efficiency with Adaptive Context Compression
por: Guo, Shuyu, et al.
Publicado: (2025)
por: Guo, Shuyu, et al.
Publicado: (2025)
EVOS: Efficient Implicit Neural Training via EVOlutionary Selector
por: Zhang, Weixiang, et al.
Publicado: (2024)
por: Zhang, Weixiang, et al.
Publicado: (2024)
RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training
por: Wu, Tianyuan, et al.
Publicado: (2025)
por: Wu, Tianyuan, et al.
Publicado: (2025)
The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs
por: Chen, Jierun, et al.
Publicado: (2025)
por: Chen, Jierun, et al.
Publicado: (2025)
How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment
por: Zhu, Rui, et al.
Publicado: (2026)
por: Zhu, Rui, et al.
Publicado: (2026)
DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training
por: Wang, Zhixin, et al.
Publicado: (2025)
por: Wang, Zhixin, et al.
Publicado: (2025)
RATIONALYST: Mining Implicit Rationales for Process Supervision of Reasoning
por: Jiang, Dongwei, et al.
Publicado: (2024)
por: Jiang, Dongwei, et al.
Publicado: (2024)
Steering Large Reasoning Models towards Concise Reasoning via Flow Matching
por: Li, Yawei, et al.
Publicado: (2026)
por: Li, Yawei, et al.
Publicado: (2026)
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
por: Li, Haozhan, et al.
Publicado: (2025)
por: Li, Haozhan, et al.
Publicado: (2025)
Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models
por: Qu, Yun, et al.
Publicado: (2026)
por: Qu, Yun, et al.
Publicado: (2026)
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
por: Zhou, Jin Peng, et al.
Publicado: (2025)
por: Zhou, Jin Peng, et al.
Publicado: (2025)
Concise Reasoning via Reinforcement Learning
por: Fatemi, Mehdi, et al.
Publicado: (2025)
por: Fatemi, Mehdi, et al.
Publicado: (2025)
Online Ramsey numbers of the claw versus cycles
por: Zhi, Hexuan, et al.
Publicado: (2026)
por: Zhi, Hexuan, et al.
Publicado: (2026)
Three-color online Ramsey numbers $\tilde{r}(P_3,P_3,P_{\ell})$ and $\tilde{r}(P_3, P_3, C_{\ell})$
por: Zhi, Hexuan, et al.
Publicado: (2025)
por: Zhi, Hexuan, et al.
Publicado: (2025)
In-Token Rationality Optimization: Towards Accurate and Concise LLM Reasoning via Self-Feedback
por: Zhu, Mingye, et al.
Publicado: (2025)
por: Zhu, Mingye, et al.
Publicado: (2025)
Internalizing World Models via Self-Play Finetuning for Agentic RL
por: Chen, Shiqi, et al.
Publicado: (2025)
por: Chen, Shiqi, et al.
Publicado: (2025)
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
por: Deng, Hexuan, et al.
Publicado: (2024)
por: Deng, Hexuan, et al.
Publicado: (2024)
RouterKGQA: Specialized--General Model Routing for Constraint-Aware Knowledge Graph Question Answering
por: Yuan, Bo, et al.
Publicado: (2026)
por: Yuan, Bo, et al.
Publicado: (2026)
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning
por: Yi, Jingyang, et al.
Publicado: (2025)
por: Yi, Jingyang, et al.
Publicado: (2025)
GraphDancer: Training LLMs to Explore and Reason over Graphs via Two-Stage Curriculum Post-Training
por: Bai, Yuyang, et al.
Publicado: (2026)
por: Bai, Yuyang, et al.
Publicado: (2026)
Internalizing Safety Understanding in Large Reasoning Models via Verification
por: Zhang, Yi, et al.
Publicado: (2026)
por: Zhang, Yi, et al.
Publicado: (2026)
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
por: Zhang, Shenao, et al.
Publicado: (2025)
por: Zhang, Shenao, et al.
Publicado: (2025)
Concise and Precise Context Compression for Tool-Using Language Models
por: Xu, Yang, et al.
Publicado: (2024)
por: Xu, Yang, et al.
Publicado: (2024)
Self-Training Elicits Concise Reasoning in Large Language Models
por: Munkhbat, Tergel, et al.
Publicado: (2025)
por: Munkhbat, Tergel, et al.
Publicado: (2025)
Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning
por: Rakotonirina, Nathanaël Carraz, et al.
Publicado: (2026)
por: Rakotonirina, Nathanaël Carraz, et al.
Publicado: (2026)
Ejemplares similares
-
SCOPE-RL: Stable and Quantitative Control of Policy Entropy in RL Post-Training
por: Wang, Chen, et al.
Publicado: (2025) -
Distribution-Centric Policy Optimization Dominates Exploration-Exploitation Trade-off
por: Li, Zhaochun, et al.
Publicado: (2026) -
REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning
por: Deng, Hexuan, et al.
Publicado: (2025) -
Information-Theoretic Distributed Point Functions with Shorter Keys
por: Deng, Hang, et al.
Publicado: (2026) -
ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models
por: Dumitru, Razvan-Gabriel, et al.
Publicado: (2025)