Saved in:
| Main Authors: | Zhang, Kaiyan, Zuo, Yuxin, He, Bingxiang, Sun, Youbang, Liu, Runze, Jiang, Che, Fan, Yuchen, Tian, Kai, Jia, Guoli, Li, Pengfei, Fu, Yu, Lv, Xingtai, Zhang, Yuchen, Zeng, Sihang, Qu, Shang, Li, Haozhan, Wang, Shijie, Wang, Yuru, Long, Xinwei, Liu, Fangfu, Xu, Xiang, Ma, Jiaze, Zhu, Xuekai, Hua, Ermo, Liu, Yihao, Li, Zonglin, Chen, Huayu, Qu, Xiaoye, Li, Yafu, Chen, Weize, Yuan, Zhenzhao, Gao, Junqi, Li, Dong, Ma, Zhiyuan, Cui, Ganqu, Liu, Zhiyuan, Qi, Biqing, Ding, Ning, Zhou, Bowen |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.08827 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TTRL: Test-Time Reinforcement Learning
by: Zuo, Yuxin, et al.
Published: (2025)
by: Zuo, Yuxin, et al.
Published: (2025)
UltraMedical: Building Specialized Generalists in Biomedicine
by: Zhang, Kaiyan, et al.
Published: (2024)
by: Zhang, Kaiyan, et al.
Published: (2024)
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
by: Hua, Ermo, et al.
Published: (2024)
by: Hua, Ermo, et al.
Published: (2024)
Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models
by: Lv, Xingtai, et al.
Published: (2025)
by: Lv, Xingtai, et al.
Published: (2025)
How Far Can Unsupervised RLVR Scale LLM Training?
by: He, Bingxiang, et al.
Published: (2026)
by: He, Bingxiang, et al.
Published: (2026)
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
by: Long, Xinwei, et al.
Published: (2025)
by: Long, Xinwei, et al.
Published: (2025)
Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention
by: Lv, Xingtai, et al.
Published: (2024)
by: Lv, Xingtai, et al.
Published: (2024)
JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
by: He, Bingxiang, et al.
Published: (2025)
by: He, Bingxiang, et al.
Published: (2025)
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
by: Cui, Ganqu, et al.
Published: (2025)
by: Cui, Ganqu, et al.
Published: (2025)
Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding
by: Zhang, Kaiyan, et al.
Published: (2024)
by: Zhang, Kaiyan, et al.
Published: (2024)
Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process
by: Hua, Ermo, et al.
Published: (2024)
by: Hua, Ermo, et al.
Published: (2024)
Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking
by: Ma, Zhiyuan, et al.
Published: (2024)
by: Ma, Zhiyuan, et al.
Published: (2024)
Free Process Rewards without Process Labels
by: Yuan, Lifan, et al.
Published: (2024)
by: Yuan, Lifan, et al.
Published: (2024)
Automating Exploratory Proteomics Research via Language Models
by: Ding, Ning, et al.
Published: (2024)
by: Ding, Ning, et al.
Published: (2024)
Automating Exploratory Multiomics Research via Language Models
by: Qu, Shang, et al.
Published: (2025)
by: Qu, Shang, et al.
Published: (2025)
Process Reinforcement through Implicit Rewards
by: Cui, Ganqu, et al.
Published: (2025)
by: Cui, Ganqu, et al.
Published: (2025)
How to Synthesize Text Data without Model Collapse?
by: Zhu, Xuekai, et al.
Published: (2024)
by: Zhu, Xuekai, et al.
Published: (2024)
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
by: Zuo, Yuxin, et al.
Published: (2025)
by: Zuo, Yuxin, et al.
Published: (2025)
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
by: Ma, Zhiyuan, et al.
Published: (2024)
by: Ma, Zhiyuan, et al.
Published: (2024)
Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation
by: Qi, Biqing, et al.
Published: (2024)
by: Qi, Biqing, et al.
Published: (2024)
Towards a Unified View of Large Language Model Post-Training
by: Lv, Xingtai, et al.
Published: (2025)
by: Lv, Xingtai, et al.
Published: (2025)
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices
by: Ma, Zhiyuan, et al.
Published: (2024)
by: Ma, Zhiyuan, et al.
Published: (2024)
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
by: Li, Haozhan, et al.
Published: (2025)
by: Li, Haozhan, et al.
Published: (2025)
ReviewRL: Towards Automated Scientific Review with RL
by: Zeng, Sihang, et al.
Published: (2025)
by: Zeng, Sihang, et al.
Published: (2025)
Post-Trained MoE Can Skip Half Experts via Self-Distillation
by: Lv, Xingtai, et al.
Published: (2026)
by: Lv, Xingtai, et al.
Published: (2026)
The Role of Bile Acid in Immune‐Mediated Skin Diseases
by: Huike Ma, et al.
Published: (2025)
by: Huike Ma, et al.
Published: (2025)
From $f(x)$ and $g(x)$ to $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
by: Yuan, Lifan, et al.
Published: (2025)
by: Yuan, Lifan, et al.
Published: (2025)
FlowRL: Matching Reward Distributions for LLM Reasoning
by: Zhu, Xuekai, et al.
Published: (2025)
by: Zhu, Xuekai, et al.
Published: (2025)
Denoising Diffusion Probabilistic Model for Radio Map Estimation in Generative Wireless Networks
by: Luo, Xuanhao, et al.
Published: (2025)
by: Luo, Xuanhao, et al.
Published: (2025)
Rank-Based Modeling for Universal Packets Compression in Multi-Modal Communications
by: Luo, Xuanhao, et al.
Published: (2025)
by: Luo, Xuanhao, et al.
Published: (2025)
CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following
by: Zhang, Kaiyan, et al.
Published: (2024)
by: Zhang, Kaiyan, et al.
Published: (2024)
Engineering Electronic Structure of Metal‐Based Catalysts Toward Selective Peroxymonosulfate Activation for Water Purification
by: Zhiyuan Feng, et al.
Published: (2026)
by: Zhiyuan Feng, et al.
Published: (2026)
OPERA: A Reinforcement Learning--Enhanced Orchestrated Planner-Executor Architecture for Reasoning-Oriented Multi-Hop Retrieval
by: Liu, Yu, et al.
Published: (2025)
by: Liu, Yu, et al.
Published: (2025)
CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning
by: Li, Ran, et al.
Published: (2026)
by: Li, Ran, et al.
Published: (2026)
AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing
by: Ma, Zhiyuan, et al.
Published: (2023)
by: Ma, Zhiyuan, et al.
Published: (2023)
Learning to Reason under Off-Policy Guidance
by: Yan, Jianhao, et al.
Published: (2025)
by: Yan, Jianhao, et al.
Published: (2025)
Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models
by: Ding, Ning, et al.
Published: (2024)
by: Ding, Ning, et al.
Published: (2024)
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning
by: Zhu, Xuekai, et al.
Published: (2023)
by: Zhu, Xuekai, et al.
Published: (2023)
Structural origin of anisotropic mechanical/thermal behavior in La 2 SrAl 2 O 7 and Nd 2 SrAl 2 O 7 perovskites
by: Bin Liu, et al.
Published: (2024)
by: Bin Liu, et al.
Published: (2024)
FaithRL: Learning to Reason Faithfully through Step-Level Faithfulness Maximization
by: Gui, Runquan, et al.
Published: (2026)
by: Gui, Runquan, et al.
Published: (2026)
Similar Items
-
TTRL: Test-Time Reinforcement Learning
by: Zuo, Yuxin, et al.
Published: (2025) -
UltraMedical: Building Specialized Generalists in Biomedicine
by: Zhang, Kaiyan, et al.
Published: (2024) -
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
by: Hua, Ermo, et al.
Published: (2024) -
Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models
by: Lv, Xingtai, et al.
Published: (2025) -
How Far Can Unsupervised RLVR Scale LLM Training?
by: He, Bingxiang, et al.
Published: (2026)