:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhang, Jiaxin, Peng, Xiangyu, Chen, Qinglin, Ye, Qinyuan, Xiong, Caiming, Wu, Chien-Sheng
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Machine Learning Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2604.16830
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Agentic Confidence Calibration
von: Zhang, Jiaxin, et al.
Veröffentlicht: (2026)

Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
von: Xu, Austin, et al.
Veröffentlicht: (2025)

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering
von: Wu, Fangzhou, et al.
Veröffentlicht: (2026)

Calibration-Aware Policy Optimization for Reasoning LLMs
von: Wang, Ziqi, et al.
Veröffentlicht: (2026)

$\boldsymbol{f}$-OPD: Stabilizing Long-Horizon On-Policy Distillation with Freshness-Aware Control
von: Chen, Xianwei, et al.
Veröffentlicht: (2026)

GRAFT: Decoupling Ranking and Calibration for Survival Analysis
von: Ashhad, Mohammad, et al.
Veröffentlicht: (2026)

Validity-Calibrated Reasoning Distillation
von: Saadi, Khouloud, et al.
Veröffentlicht: (2026)

Dirichlet-Based Prediction Calibration for Learning with Noisy Labels
von: Zong, Chen-Chen, et al.
Veröffentlicht: (2024)

Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
von: Ye, Qinyuan, et al.
Veröffentlicht: (2025)

Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack
von: Xu, Xiaoyue, et al.
Veröffentlicht: (2024)

Enabling High Data Throughput Reinforcement Learning on GPUs: A Domain Agnostic Framework for Data-Driven Scientific Research
von: Lan, Tian, et al.
Veröffentlicht: (2024)

Extreme Region Policy Distillation
von: Chen, Changyu, et al.
Veröffentlicht: (2026)

Distillation Traps and Guards: A Calibration Knob for LLM Distillability
von: Zhan, Weixiao, et al.
Veröffentlicht: (2026)

On Calibration of Large Language Models: From Response To Capability
von: Yang, Sin-Han, et al.
Veröffentlicht: (2026)

Pretrain Value, Not Reward: Decoupled Value Policy Optimization
von: Huang, Chenghua, et al.
Veröffentlicht: (2025)

SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting
von: Zheng, Binbin, et al.
Veröffentlicht: (2026)

Unleashing the Denoising Capability of Diffusion Prior for Solving Inverse Problems
von: Zhang, Jiawei, et al.
Veröffentlicht: (2024)

Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
von: Zeng, Zhiyuan, et al.
Veröffentlicht: (2025)

Are Flat Minima an Illusion?
von: Bennett, Michael Timothy
Veröffentlicht: (2026)

The Over-Certainty Phenomenon in Modern Test-Time Adaptation Algorithms
von: Amin, Fin, et al.
Veröffentlicht: (2024)

Improving Prediction Certainty Estimation for Reliable Early Exiting via Null Space Projection
von: He, Jianing, et al.
Veröffentlicht: (2025)

DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
von: Li, Gang, et al.
Veröffentlicht: (2025)

Proximal Policy Distillation
von: Spigler, Giacomo
Veröffentlicht: (2024)

When Maximum Entropy Misleads Policy Optimization
von: Zhang, Ruipeng, et al.
Veröffentlicht: (2025)

Towards Flash Thinking via Decoupled Advantage Policy Optimization
von: Tan, Zezhong, et al.
Veröffentlicht: (2025)

AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation
von: Zhang, Songming, et al.
Veröffentlicht: (2025)

Prompt Engineering a Prompt Engineer
von: Ye, Qinyuan, et al.
Veröffentlicht: (2023)

An approach of deep reinforcement learning for maximizing the net present value of stochastic projects
von: Xu, Wei, et al.
Veröffentlicht: (2025)

ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation
von: Liang, Kun, et al.
Veröffentlicht: (2026)

OPD+: Rethinking the Advantage Design for On-Policy Distillation
von: Zhao, Hanyang, et al.
Veröffentlicht: (2026)

Dynamic Evidence Decoupling for Trusted Multi-view Learning
von: Liu, Ying, et al.
Veröffentlicht: (2024)

Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts
von: Nguyen, Xuan-Phi, et al.
Veröffentlicht: (2026)

The Illusion of Readiness in Health AI
von: Gu, Yu, et al.
Veröffentlicht: (2025)

Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO
von: Yu, Bowen, et al.
Veröffentlicht: (2026)

PACED: Distillation and On-Policy Self-Distillation at the Frontier of Student Competence
von: Xu, Yuanda, et al.
Veröffentlicht: (2026)

PTDE: Personalized Training with Distilled Execution for Multi-Agent Reinforcement Learning
von: Chen, Yiqun, et al.
Veröffentlicht: (2022)

Agentic Uncertainty Quantification
von: Zhang, Jiaxin, et al.
Veröffentlicht: (2026)

Context Distillation as Latent Memory Management
von: Zheng, Ziyang, et al.
Veröffentlicht: (2026)

Beyond Variance: Prompt-Efficient RLVR via Rare-Event Amplification and Bidirectional Pairing
von: Pang, Yujuan, et al.
Veröffentlicht: (2026)

HDPO: Hybrid Distillation Policy Optimization via Privileged Self-Distillation
von: Ding, Ken
Veröffentlicht: (2026)