:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liang, Kun, Bai, Clive, Xu, Xin, Tang, Chenming, Lee, Sanwoo, Liu, Weijie, Yang, Saiyong, Wu, Yunfang
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.08310
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation
by: Liang, Kun, et al.
Published: (2026)

Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error
by: Tang, Chenming, et al.
Published: (2025)

Think Outside the Policy: In-Context Steered Policy Optimization
by: Huang, Hsiu-Yuan, et al.
Published: (2025)

Composable Cross-prompt Essay Scoring by Merging Models
by: Lee, Sanwoo, et al.
Published: (2025)

Rank-Then-Score: Enhancing Large Language Models for Automated Essay Scoring
by: Cai, Yida, et al.
Published: (2025)

Democratizing Tool Learning with Environments Fully Simulated by a Free 8B Language Model
by: Tang, Chenming, et al.
Published: (2026)

Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models
by: Qu, Yun, et al.
Published: (2026)

CFMS: Towards Explainable and Fine-Grained Chinese Multimodal Sarcasm Detection Benchmark
by: Zhang, Junzhao, et al.
Published: (2026)

EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control
by: Yang, Kai, et al.
Published: (2025)

FPT: Feature Prompt Tuning for Few-shot Readability Assessment
by: Wang, Ziyang, et al.
Published: (2024)

A Survey of Uncertainty Estimation in LLMs: Theory Meets Practice
by: Huang, Hsiu-Yuan, et al.
Published: (2024)

Trait-Aware Policy Optimization for Autoregressive Multi-Trait Essay Scoring
by: Wang, Zhengyang, et al.
Published: (2026)

Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction
by: Tang, Chenming, et al.
Published: (2024)

SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation
by: Tang, Chenming, et al.
Published: (2024)

Going Beyond Word Matching: Syntax Improves In-context Example Selection for Machine Translation
by: Tang, Chenming, et al.
Published: (2024)

Evaluating the Capability of Large-scale Language Models on Chinese Grammatical Error Correction Task
by: Qu, Fanyi, et al.
Published: (2023)

Unleashing Large Language Models' Proficiency in Zero-shot Essay Scoring
by: Lee, Sanwoo, et al.
Published: (2024)

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
by: Yang, Wenkai, et al.
Published: (2026)

Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models
by: Chen, Zhipeng, et al.
Published: (2025)

DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management
by: Su, Xuerui, et al.
Published: (2025)

LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
by: Yang, Wenkai, et al.
Published: (2025)

ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget
by: Thakur, Nandan, et al.
Published: (2026)

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
by: Qu, Yun, et al.
Published: (2026)

WESE: Weak Exploration to Strong Exploitation for LLM Agents
by: Huang, Xu, et al.
Published: (2024)

Dynamic Fisher-weighted Model Merging via Bayesian Optimization
by: Lee, Sanwoo, et al.
Published: (2025)

Aligning Language Models with Real-time Knowledge Editing
by: Tang, Chenming, et al.
Published: (2025)

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
by: Xu, Xin, et al.
Published: (2026)

Exploration and Exploitation Errors Are Measurable for Language Model Agents
by: Park, Jaden, et al.
Published: (2026)

BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens
by: Wen, Hao, et al.
Published: (2025)

MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation
by: Yang, Lu, et al.
Published: (2026)

A Scalable Multi-LLM Collaboration System with Retrieval-based Selection and Exploration-Exploitation-Driven Enhancement
by: Tang, Shengji, et al.
Published: (2025)

Lost in the Passage: Passage-level In-context Learning Does Not Necessarily Need a "Passage"
by: Sun, Hao, et al.
Published: (2025)

Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions
by: Tang, Chenming, et al.
Published: (2024)

Decoupling Exploration and Exploitation for Unsupervised Pre-training with Successor Features
by: Kim, JaeYoon, et al.
Published: (2024)

$ϕ$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation
by: Xu, Fangzhi, et al.
Published: (2025)

Controlling Exploration-Exploitation in GFlowNets via Markov Chain Perspectives
by: Chen, Lin, et al.
Published: (2026)

In-context Exploration-Exploitation for Reinforcement Learning
by: Dai, Zhenwen, et al.
Published: (2024)

Exploitation Is All You Need... for Exploration
by: Rentschler, Micah, et al.
Published: (2025)

B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
by: Zeng, Weihao, et al.
Published: (2024)

Code Repair with LLMs gives an Exploration-Exploitation Tradeoff
by: Tang, Hao, et al.
Published: (2024)