Saved in:
| Main Authors: | Liang, Kun, Bai, Clive, Xu, Xin, Tang, Chenming, Lee, Sanwoo, Liu, Weijie, Yang, Saiyong, Wu, Yunfang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.08310 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation
by: Liang, Kun, et al.
Published: (2026)
by: Liang, Kun, et al.
Published: (2026)
Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error
by: Tang, Chenming, et al.
Published: (2025)
by: Tang, Chenming, et al.
Published: (2025)
Think Outside the Policy: In-Context Steered Policy Optimization
by: Huang, Hsiu-Yuan, et al.
Published: (2025)
by: Huang, Hsiu-Yuan, et al.
Published: (2025)
Composable Cross-prompt Essay Scoring by Merging Models
by: Lee, Sanwoo, et al.
Published: (2025)
by: Lee, Sanwoo, et al.
Published: (2025)
Rank-Then-Score: Enhancing Large Language Models for Automated Essay Scoring
by: Cai, Yida, et al.
Published: (2025)
by: Cai, Yida, et al.
Published: (2025)
Democratizing Tool Learning with Environments Fully Simulated by a Free 8B Language Model
by: Tang, Chenming, et al.
Published: (2026)
by: Tang, Chenming, et al.
Published: (2026)
Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models
by: Qu, Yun, et al.
Published: (2026)
by: Qu, Yun, et al.
Published: (2026)
CFMS: Towards Explainable and Fine-Grained Chinese Multimodal Sarcasm Detection Benchmark
by: Zhang, Junzhao, et al.
Published: (2026)
by: Zhang, Junzhao, et al.
Published: (2026)
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control
by: Yang, Kai, et al.
Published: (2025)
by: Yang, Kai, et al.
Published: (2025)
FPT: Feature Prompt Tuning for Few-shot Readability Assessment
by: Wang, Ziyang, et al.
Published: (2024)
by: Wang, Ziyang, et al.
Published: (2024)
A Survey of Uncertainty Estimation in LLMs: Theory Meets Practice
by: Huang, Hsiu-Yuan, et al.
Published: (2024)
by: Huang, Hsiu-Yuan, et al.
Published: (2024)
Trait-Aware Policy Optimization for Autoregressive Multi-Trait Essay Scoring
by: Wang, Zhengyang, et al.
Published: (2026)
by: Wang, Zhengyang, et al.
Published: (2026)
Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction
by: Tang, Chenming, et al.
Published: (2024)
by: Tang, Chenming, et al.
Published: (2024)
SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation
by: Tang, Chenming, et al.
Published: (2024)
by: Tang, Chenming, et al.
Published: (2024)
Going Beyond Word Matching: Syntax Improves In-context Example Selection for Machine Translation
by: Tang, Chenming, et al.
Published: (2024)
by: Tang, Chenming, et al.
Published: (2024)
Evaluating the Capability of Large-scale Language Models on Chinese Grammatical Error Correction Task
by: Qu, Fanyi, et al.
Published: (2023)
by: Qu, Fanyi, et al.
Published: (2023)
Unleashing Large Language Models' Proficiency in Zero-shot Essay Scoring
by: Lee, Sanwoo, et al.
Published: (2024)
by: Lee, Sanwoo, et al.
Published: (2024)
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
by: Yang, Wenkai, et al.
Published: (2026)
by: Yang, Wenkai, et al.
Published: (2026)
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models
by: Chen, Zhipeng, et al.
Published: (2025)
by: Chen, Zhipeng, et al.
Published: (2025)
DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management
by: Su, Xuerui, et al.
Published: (2025)
by: Su, Xuerui, et al.
Published: (2025)
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
by: Yang, Wenkai, et al.
Published: (2025)
by: Yang, Wenkai, et al.
Published: (2025)
ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget
by: Thakur, Nandan, et al.
Published: (2026)
by: Thakur, Nandan, et al.
Published: (2026)
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
by: Qu, Yun, et al.
Published: (2026)
by: Qu, Yun, et al.
Published: (2026)
WESE: Weak Exploration to Strong Exploitation for LLM Agents
by: Huang, Xu, et al.
Published: (2024)
by: Huang, Xu, et al.
Published: (2024)
Dynamic Fisher-weighted Model Merging via Bayesian Optimization
by: Lee, Sanwoo, et al.
Published: (2025)
by: Lee, Sanwoo, et al.
Published: (2025)
Aligning Language Models with Real-time Knowledge Editing
by: Tang, Chenming, et al.
Published: (2025)
by: Tang, Chenming, et al.
Published: (2025)
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
by: Xu, Xin, et al.
Published: (2026)
by: Xu, Xin, et al.
Published: (2026)
Exploration and Exploitation Errors Are Measurable for Language Model Agents
by: Park, Jaden, et al.
Published: (2026)
by: Park, Jaden, et al.
Published: (2026)
BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens
by: Wen, Hao, et al.
Published: (2025)
by: Wen, Hao, et al.
Published: (2025)
MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation
by: Yang, Lu, et al.
Published: (2026)
by: Yang, Lu, et al.
Published: (2026)
A Scalable Multi-LLM Collaboration System with Retrieval-based Selection and Exploration-Exploitation-Driven Enhancement
by: Tang, Shengji, et al.
Published: (2025)
by: Tang, Shengji, et al.
Published: (2025)
Lost in the Passage: Passage-level In-context Learning Does Not Necessarily Need a "Passage"
by: Sun, Hao, et al.
Published: (2025)
by: Sun, Hao, et al.
Published: (2025)
Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions
by: Tang, Chenming, et al.
Published: (2024)
by: Tang, Chenming, et al.
Published: (2024)
Decoupling Exploration and Exploitation for Unsupervised Pre-training with Successor Features
by: Kim, JaeYoon, et al.
Published: (2024)
by: Kim, JaeYoon, et al.
Published: (2024)
$ϕ$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation
by: Xu, Fangzhi, et al.
Published: (2025)
by: Xu, Fangzhi, et al.
Published: (2025)
Controlling Exploration-Exploitation in GFlowNets via Markov Chain Perspectives
by: Chen, Lin, et al.
Published: (2026)
by: Chen, Lin, et al.
Published: (2026)
In-context Exploration-Exploitation for Reinforcement Learning
by: Dai, Zhenwen, et al.
Published: (2024)
by: Dai, Zhenwen, et al.
Published: (2024)
Exploitation Is All You Need... for Exploration
by: Rentschler, Micah, et al.
Published: (2025)
by: Rentschler, Micah, et al.
Published: (2025)
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
by: Zeng, Weihao, et al.
Published: (2024)
by: Zeng, Weihao, et al.
Published: (2024)
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff
by: Tang, Hao, et al.
Published: (2024)
by: Tang, Hao, et al.
Published: (2024)
Similar Items
-
ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation
by: Liang, Kun, et al.
Published: (2026) -
Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error
by: Tang, Chenming, et al.
Published: (2025) -
Think Outside the Policy: In-Context Steered Policy Optimization
by: Huang, Hsiu-Yuan, et al.
Published: (2025) -
Composable Cross-prompt Essay Scoring by Merging Models
by: Lee, Sanwoo, et al.
Published: (2025) -
Rank-Then-Score: Enhancing Large Language Models for Automated Essay Scoring
by: Cai, Yida, et al.
Published: (2025)