Saved in:
| Main Authors: | Zhao, Yu, Jiang, Fan, Liu, Tianle, Zeng, Bo, Liu, Yu, Wang, Longyue, Luo, Weihua |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.06375 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling
by: Jiang, Fan, et al.
Published: (2026)
by: Jiang, Fan, et al.
Published: (2026)
A State-Transition Framework for Efficient LLM Reasoning
by: Zhang, Liang, et al.
Published: (2026)
by: Zhang, Liang, et al.
Published: (2026)
ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox
by: Li, Yuanyang, et al.
Published: (2026)
by: Li, Yuanyang, et al.
Published: (2026)
CaptchaMind: Training CAPTCHA Solvers via Reinforcement Learning with Explicit Reasoning Supervision
by: Wang, Pengcheng, et al.
Published: (2026)
by: Wang, Pengcheng, et al.
Published: (2026)
From Insight to Action: A Novel Framework for Interpretability-Guided Data Selection in Large Language Models
by: Shi, Ling, et al.
Published: (2026)
by: Shi, Ling, et al.
Published: (2026)
VIDA: A dataset for Visually Dependent Ambiguity in Multimodal Machine Translation
by: Pan, Jingheng, et al.
Published: (2026)
by: Pan, Jingheng, et al.
Published: (2026)
Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models
by: Yin, Huifeng, et al.
Published: (2025)
by: Yin, Huifeng, et al.
Published: (2025)
Incentivizing Parametric Knowledge via Reinforcement Learning with Verifiable Rewards for Cross-Cultural Entity Translation
by: Zhou, Jiang, et al.
Published: (2026)
by: Zhou, Jiang, et al.
Published: (2026)
VeriDispatcher: Multi-Model Dispatching through Pre-Inference Difficulty Prediction for RTL Generation Optimization
by: Wang, Zeng, et al.
Published: (2025)
by: Wang, Zeng, et al.
Published: (2025)
Finding the Translation Switch: Discovering and Exploiting the Task-Initiation Features in LLMs
by: Wu, Xinwei, et al.
Published: (2026)
by: Wu, Xinwei, et al.
Published: (2026)
Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design
by: Zhu, Bin, et al.
Published: (2026)
by: Zhu, Bin, et al.
Published: (2026)
Building Autonomous GUI Navigation via Agentic-Q Estimation and Step-Wise Policy Optimization
by: Wang, Yibo, et al.
Published: (2026)
by: Wang, Yibo, et al.
Published: (2026)
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning
by: Feng, Wenfeng, et al.
Published: (2025)
by: Feng, Wenfeng, et al.
Published: (2025)
DARE: Difficulty-Adaptive Reinforcement Learning with Co-Evolved Difficulty Estimation
by: Zhou, Yang, et al.
Published: (2026)
by: Zhou, Yang, et al.
Published: (2026)
Optimizing Reasoning Efficiency through Prompt Difficulty Prediction
by: Zhao, Bo, et al.
Published: (2025)
by: Zhao, Bo, et al.
Published: (2025)
TaskSense: Cognitive Chain Modeling and Difficulty Estimation for GUI Tasks
by: Yin, Yiwen, et al.
Published: (2025)
by: Yin, Yiwen, et al.
Published: (2025)
Building Decision Making Models Through Language Model Regime
by: Zhang, Yu, et al.
Published: (2024)
by: Zhang, Yu, et al.
Published: (2024)
HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents in Hierarchical Rule Application
by: Yang, Yiqian, et al.
Published: (2025)
by: Yang, Yiqian, et al.
Published: (2025)
Truncated Proximal Policy Optimization
by: Fan, Tiantian, et al.
Published: (2025)
by: Fan, Tiantian, et al.
Published: (2025)
Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning
by: Zhang, Tianle, et al.
Published: (2024)
by: Zhang, Tianle, et al.
Published: (2024)
CoMelSinger: Discrete Token-Based Zero-Shot Singing Synthesis With Structured Melody Control and Guidance
by: Zhao, Junchuan, et al.
Published: (2025)
by: Zhao, Junchuan, et al.
Published: (2025)
Vectra: A New Metric, Dataset, and Model for Visual Quality Assessment in E-Commerce In-Image Machine Translation
by: Wu, Qingyu, et al.
Published: (2026)
by: Wu, Qingyu, et al.
Published: (2026)
Transform then Explore: a Simple and Effective Technique for Exploratory Combinatorial Optimization with Reinforcement Learning
by: Pu, Tianle, et al.
Published: (2024)
by: Pu, Tianle, et al.
Published: (2024)
Agents on a Tree: Pathwise Coordination for Multi-Objective Molecular Optimization
by: Zhang, Jia, et al.
Published: (2026)
by: Zhang, Jia, et al.
Published: (2026)
DrugAssist: A Large Language Model for Molecule Optimization
by: Ye, Geyan, et al.
Published: (2023)
by: Ye, Geyan, et al.
Published: (2023)
MAB Optimizer for Estimating Math Question Difficulty via Inverse CV without NLP
by: Das, Surajit, et al.
Published: (2025)
by: Das, Surajit, et al.
Published: (2025)
State Regularized Policy Optimization on Data with Dynamics Shift
by: Xue, Zhenghai, et al.
Published: (2023)
by: Xue, Zhenghai, et al.
Published: (2023)
MorphoBench: A Benchmark with Difficulty Adaptive to Model Reasoning
by: Wang, Xukai, et al.
Published: (2025)
by: Wang, Xukai, et al.
Published: (2025)
An Enhanced Grey Wolf Optimizer with Elite Inheritance and Balance Search Mechanisms
by: Jiang, Jianhua, et al.
Published: (2024)
by: Jiang, Jianhua, et al.
Published: (2024)
SimKO: Simple Pass@K Policy Optimization
by: Peng, Ruotian, et al.
Published: (2025)
by: Peng, Ruotian, et al.
Published: (2025)
LFPO: Likelihood-Free Policy Optimization for Masked Diffusion Models
by: Wei, Chenxing, et al.
Published: (2026)
by: Wei, Chenxing, et al.
Published: (2026)
Estimating Difficulty Levels of Programming Problems with Pre-trained Model
by: Wang, Zhiyuan, et al.
Published: (2024)
by: Wang, Zhiyuan, et al.
Published: (2024)
LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization
by: Zhao, Yang, et al.
Published: (2026)
by: Zhao, Yang, et al.
Published: (2026)
Group-in-Group Policy Optimization for LLM Agent Training
by: Feng, Lang, et al.
Published: (2025)
by: Feng, Lang, et al.
Published: (2025)
Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation
by: Yu, Zhiqi, et al.
Published: (2026)
by: Yu, Zhiqi, et al.
Published: (2026)
Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization
by: Liu, Jiacai, et al.
Published: (2024)
by: Liu, Jiacai, et al.
Published: (2024)
Learn to Relax with Large Language Models: Solving Constraint Optimization Problems via Bidirectional Coevolution
by: Liu, Beidan, et al.
Published: (2025)
by: Liu, Beidan, et al.
Published: (2025)
Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning
by: Yao, Yihang, et al.
Published: (2023)
by: Yao, Yihang, et al.
Published: (2023)
Tackling the Inherent Difficulty of Noise Filtering in RAG
by: Liu, Jingyu, et al.
Published: (2026)
by: Liu, Jingyu, et al.
Published: (2026)
ESPO: Entropy Importance Sampling Policy Optimization
by: Sheng, Yuepeng, et al.
Published: (2025)
by: Sheng, Yuepeng, et al.
Published: (2025)
Similar Items
-
Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling
by: Jiang, Fan, et al.
Published: (2026) -
A State-Transition Framework for Efficient LLM Reasoning
by: Zhang, Liang, et al.
Published: (2026) -
ComplexMCP: Evaluation of LLM Agents in Dynamic, Interdependent, and Large-Scale Tool Sandbox
by: Li, Yuanyang, et al.
Published: (2026) -
CaptchaMind: Training CAPTCHA Solvers via Reinforcement Learning with Explicit Reasoning Supervision
by: Wang, Pengcheng, et al.
Published: (2026) -
From Insight to Action: A Novel Framework for Interpretability-Guided Data Selection in Large Language Models
by: Shi, Ling, et al.
Published: (2026)