:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Haolin, Jiang, Shuyang, Zhang, Ruipeng, Yao, Jiangchao, Zhang, Ya, Wang, Yanfeng
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2604.11547
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Federated Learning with Bilateral Curation for Partially Class-Disjoint Data
by: Fan, Ziqing, et al.
Published: (2024)

Federated Learning under Partially Class-Disjoint Data via Manifold Reshaping
by: Fan, Ziqing, et al.
Published: (2024)

Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts
by: Zhang, Ruipeng, et al.
Published: (2024)

UniChest: Conquer-and-Divide Pre-training for Multi-Source Chest X-Ray Classification
by: Dai, Tianjie, et al.
Published: (2023)

RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis
by: Li, Haolin, et al.
Published: (2025)

Learning to Instruct for Visual Instruction Tuning
by: Zhou, Zhihan, et al.
Published: (2025)

Low-Rank Knowledge Decomposition for Medical Foundation Models
by: Zhou, Yuhang, et al.
Published: (2024)

NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
by: Chen, Huayu, et al.
Published: (2025)

SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions
by: Suvarna, Ashima, et al.
Published: (2026)

Verbal Process Supervision Elicits Better Coding Agents
by: Chen, Hao-Yuan, et al.
Published: (2025)

MedS$^3$: Towards Medical Slow Thinking with Self-Evolved Soft Dual-sided Process Supervision
by: Jiang, Shuyang, et al.
Published: (2025)

Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
by: Hu, Jingcheng, et al.
Published: (2025)

Miner:Mining Intrinsic Mastery for Data-Efficient RL in Large Reasoning Models
by: Jiang, Shuyang, et al.
Published: (2026)

Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning
by: Ding, Fei, et al.
Published: (2026)

Dual-granularity Sinkhorn Distillation for Enhanced Learning from Long-tailed Noisy Data
by: Hong, Feng, et al.
Published: (2025)

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
by: Xiong, Wei, et al.
Published: (2025)

Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models
by: Zhang, Zizhuo, et al.
Published: (2025)

Diversified Batch Selection for Training Acceleration
by: Hong, Feng, et al.
Published: (2024)

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning
by: Liang, Xiao, et al.
Published: (2025)

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning
by: Deng, Yihe, et al.
Published: (2025)

From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?
by: Zhou, Zhanke, et al.
Published: (2025)

T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
by: Hou, Zhenyu, et al.
Published: (2025)

Semi-Supervised Learning for Bilingual Lexicon Induction
by: Garnier, Paul, et al.
Published: (2024)

Knowledge Graph Reasoning with Self-supervised Reinforcement Learning
by: Ma, Ying, et al.
Published: (2024)

Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning
by: Liu, Chi, et al.
Published: (2025)

Reconstructing Human Mobility Pattern: A Semi-Supervised Approach for Cross-Dataset Transfer Learning
by: Liao, Xishun, et al.
Published: (2024)

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
by: Wu, Mingqi, et al.
Published: (2025)

Reprogramming Distillation for Medical Foundation Models
by: Zhou, Yuhang, et al.
Published: (2024)

Synergy-of-Thoughts: Eliciting Efficient Reasoning in Hybrid Language Models
by: Shang, Yu, et al.
Published: (2024)

Construct, Align, and Reason: Large Ontology Models for Enterprise Knowledge Management
by: Zhang, Yao, et al.
Published: (2026)

Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models
by: Zhang, Fuxiang, et al.
Published: (2024)

Less is More: One-shot Subgraph Reasoning on Large-scale Knowledge Graphs
by: Zhou, Zhanke, et al.
Published: (2024)

Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning
by: Liu, Haolin, et al.
Published: (2026)

TAIA: Large Language Models are Out-of-Distribution Data Learners
by: Jiang, Shuyang, et al.
Published: (2024)

ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning
by: Zhou, Ruiyang, et al.
Published: (2025)

CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning
by: Li, Ran, et al.
Published: (2026)

Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning
by: Zhao, Zihua, et al.
Published: (2025)

Lightweight Contenders: Navigating Semi-Supervised Text Mining through Peer Collaboration and Self Transcendence
by: Mao, Qianren, et al.
Published: (2024)

Eliciting Behaviors in Multi-Turn Conversations
by: Huang, Jing, et al.
Published: (2025)

Multistage Collaborative Knowledge Distillation from a Large Language Model for Semi-Supervised Sequence Generation
by: Zhao, Jiachen, et al.
Published: (2023)