Saved in:
| Main Authors: | Li, Haolin, Jiang, Shuyang, Zhang, Ruipeng, Yao, Jiangchao, Zhang, Ya, Wang, Yanfeng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.11547 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Federated Learning with Bilateral Curation for Partially Class-Disjoint Data
by: Fan, Ziqing, et al.
Published: (2024)
by: Fan, Ziqing, et al.
Published: (2024)
Federated Learning under Partially Class-Disjoint Data via Manifold Reshaping
by: Fan, Ziqing, et al.
Published: (2024)
by: Fan, Ziqing, et al.
Published: (2024)
Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts
by: Zhang, Ruipeng, et al.
Published: (2024)
by: Zhang, Ruipeng, et al.
Published: (2024)
UniChest: Conquer-and-Divide Pre-training for Multi-Source Chest X-Ray Classification
by: Dai, Tianjie, et al.
Published: (2023)
by: Dai, Tianjie, et al.
Published: (2023)
RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis
by: Li, Haolin, et al.
Published: (2025)
by: Li, Haolin, et al.
Published: (2025)
Learning to Instruct for Visual Instruction Tuning
by: Zhou, Zhihan, et al.
Published: (2025)
by: Zhou, Zhihan, et al.
Published: (2025)
Low-Rank Knowledge Decomposition for Medical Foundation Models
by: Zhou, Yuhang, et al.
Published: (2024)
by: Zhou, Yuhang, et al.
Published: (2024)
NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
by: Chen, Huayu, et al.
Published: (2025)
by: Chen, Huayu, et al.
Published: (2025)
SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions
by: Suvarna, Ashima, et al.
Published: (2026)
by: Suvarna, Ashima, et al.
Published: (2026)
Verbal Process Supervision Elicits Better Coding Agents
by: Chen, Hao-Yuan, et al.
Published: (2025)
by: Chen, Hao-Yuan, et al.
Published: (2025)
MedS$^3$: Towards Medical Slow Thinking with Self-Evolved Soft Dual-sided Process Supervision
by: Jiang, Shuyang, et al.
Published: (2025)
by: Jiang, Shuyang, et al.
Published: (2025)
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
by: Hu, Jingcheng, et al.
Published: (2025)
by: Hu, Jingcheng, et al.
Published: (2025)
Miner:Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models
by: Jiang, Shuyang, et al.
Published: (2026)
by: Jiang, Shuyang, et al.
Published: (2026)
Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning
by: Ding, Fei, et al.
Published: (2026)
by: Ding, Fei, et al.
Published: (2026)
Dual-granularity Sinkhorn Distillation for Enhanced Learning from Long-tailed Noisy Data
by: Hong, Feng, et al.
Published: (2025)
by: Hong, Feng, et al.
Published: (2025)
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
by: Xiong, Wei, et al.
Published: (2025)
by: Xiong, Wei, et al.
Published: (2025)
Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models
by: Zhang, Zizhuo, et al.
Published: (2025)
by: Zhang, Zizhuo, et al.
Published: (2025)
Diversified Batch Selection for Training Acceleration
by: Hong, Feng, et al.
Published: (2024)
by: Hong, Feng, et al.
Published: (2024)
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning
by: Liang, Xiao, et al.
Published: (2025)
by: Liang, Xiao, et al.
Published: (2025)
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
by: Deng, Yihe, et al.
Published: (2025)
by: Deng, Yihe, et al.
Published: (2025)
From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?
by: Zhou, Zhanke, et al.
Published: (2025)
by: Zhou, Zhanke, et al.
Published: (2025)
T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
by: Hou, Zhenyu, et al.
Published: (2025)
by: Hou, Zhenyu, et al.
Published: (2025)
Semi-Supervised Learning for Bilingual Lexicon Induction
by: Garnier, Paul, et al.
Published: (2024)
by: Garnier, Paul, et al.
Published: (2024)
Knowledge Graph Reasoning with Self-supervised Reinforcement Learning
by: Ma, Ying, et al.
Published: (2024)
by: Ma, Ying, et al.
Published: (2024)
Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning
by: Liu, Chi, et al.
Published: (2025)
by: Liu, Chi, et al.
Published: (2025)
Reconstructing Human Mobility Pattern: A Semi-Supervised Approach for Cross-Dataset Transfer Learning
by: Liao, Xishun, et al.
Published: (2024)
by: Liao, Xishun, et al.
Published: (2024)
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
by: Wu, Mingqi, et al.
Published: (2025)
by: Wu, Mingqi, et al.
Published: (2025)
Reprogramming Distillation for Medical Foundation Models
by: Zhou, Yuhang, et al.
Published: (2024)
by: Zhou, Yuhang, et al.
Published: (2024)
Synergy-of-Thoughts: Eliciting Efficient Reasoning in Hybrid Language Models
by: Shang, Yu, et al.
Published: (2024)
by: Shang, Yu, et al.
Published: (2024)
Construct, Align, and Reason: Large Ontology Models for Enterprise Knowledge Management
by: Zhang, Yao, et al.
Published: (2026)
by: Zhang, Yao, et al.
Published: (2026)
Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models
by: Zhang, Fuxiang, et al.
Published: (2024)
by: Zhang, Fuxiang, et al.
Published: (2024)
Less is More: One-shot Subgraph Reasoning on Large-scale Knowledge Graphs
by: Zhou, Zhanke, et al.
Published: (2024)
by: Zhou, Zhanke, et al.
Published: (2024)
Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning
by: Liu, Haolin, et al.
Published: (2026)
by: Liu, Haolin, et al.
Published: (2026)
TAIA: Large Language Models are Out-of-Distribution Data Learners
by: Jiang, Shuyang, et al.
Published: (2024)
by: Jiang, Shuyang, et al.
Published: (2024)
ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning
by: Zhou, Ruiyang, et al.
Published: (2025)
by: Zhou, Ruiyang, et al.
Published: (2025)
CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning
by: Li, Ran, et al.
Published: (2026)
by: Li, Ran, et al.
Published: (2026)
Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning
by: Zhao, Zihua, et al.
Published: (2025)
by: Zhao, Zihua, et al.
Published: (2025)
Lightweight Contenders: Navigating Semi-Supervised Text Mining through Peer Collaboration and Self Transcendence
by: Mao, Qianren, et al.
Published: (2024)
by: Mao, Qianren, et al.
Published: (2024)
Eliciting Behaviors in Multi-Turn Conversations
by: Huang, Jing, et al.
Published: (2025)
by: Huang, Jing, et al.
Published: (2025)
Multistage Collaborative Knowledge Distillation from a Large Language Model for Semi-Supervised Sequence Generation
by: Zhao, Jiachen, et al.
Published: (2023)
by: Zhao, Jiachen, et al.
Published: (2023)
Similar Items
-
Federated Learning with Bilateral Curation for Partially Class-Disjoint Data
by: Fan, Ziqing, et al.
Published: (2024) -
Federated Learning under Partially Class-Disjoint Data via Manifold Reshaping
by: Fan, Ziqing, et al.
Published: (2024) -
Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts
by: Zhang, Ruipeng, et al.
Published: (2024) -
UniChest: Conquer-and-Divide Pre-training for Multi-Source Chest X-Ray Classification
by: Dai, Tianjie, et al.
Published: (2023) -
RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis
by: Li, Haolin, et al.
Published: (2025)