Saved in:
| Main Authors: | Han, Xinchen, Afifi, Hossam, Marot, Michel, Wang, Xilu, Yin, Lu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.10048 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Automatic Constraint Policy Optimization based on Continuous Constraint Interpolation Framework for Offline Reinforcement Learning
by: Han, Xinchen, et al.
Published: (2026)
by: Han, Xinchen, et al.
Published: (2026)
PIQL: Projective Implicit Q-Learning with Support Constraint for Offline Reinforcement Learning
by: Han, Xinchen, et al.
Published: (2025)
by: Han, Xinchen, et al.
Published: (2025)
Activation Steering for Chain-of-Thought Compression
by: Azizi, Seyedarmin, et al.
Published: (2025)
by: Azizi, Seyedarmin, et al.
Published: (2025)
Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs
by: Liang, Zhuowen, et al.
Published: (2026)
by: Liang, Zhuowen, et al.
Published: (2026)
Long Chain-of-Thought Reasoning Across Languages
by: Barua, Josh, et al.
Published: (2025)
by: Barua, Josh, et al.
Published: (2025)
ELAS: Efficient Pre-Training of Low-Rank Large Language Models via 2:4 Activation Sparsity
by: Li, Jiaxi, et al.
Published: (2026)
by: Li, Jiaxi, et al.
Published: (2026)
An Amortized Efficiency Threshold for Comparing Neural and Heuristic Solvers in Combinatorial Optimization
by: Afifi, Sohaib
Published: (2026)
by: Afifi, Sohaib
Published: (2026)
LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning
by: Motwani, Sumeet Ramesh, et al.
Published: (2026)
by: Motwani, Sumeet Ramesh, et al.
Published: (2026)
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
by: Huang, Xingyue, et al.
Published: (2025)
by: Huang, Xingyue, et al.
Published: (2025)
Group Orthogonalized Policy Optimization:Group Policy Optimization as Orthogonal Projection in Hilbert Space
by: Zixian, Wang
Published: (2026)
by: Zixian, Wang
Published: (2026)
DGPO: Distribution Guided Policy Optimization for Fine Grained Credit Assignment
by: Jin, Hongbo, et al.
Published: (2026)
by: Jin, Hongbo, et al.
Published: (2026)
Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks
by: He, Shuo, et al.
Published: (2026)
by: He, Shuo, et al.
Published: (2026)
Towards impactful challenges: post-challenge paper, benchmarks and other dissemination actions
by: Marot, Antoine, et al.
Published: (2023)
by: Marot, Antoine, et al.
Published: (2023)
From Offline to Online Memory-Free and Task-Free Continual Learning via Fine-Grained Hypergradients
by: Michel, Nicolas, et al.
Published: (2025)
by: Michel, Nicolas, et al.
Published: (2025)
Evolving Demonstration Optimization for Chain-of-Thought Feature Transformation
by: Wang, Xinyuan, et al.
Published: (2026)
by: Wang, Xinyuan, et al.
Published: (2026)
Training Multimodal Large Reasoning Models Needs Better Thoughts: A Three-Stage Framework for Long Chain-of-Thought Synthesis and Selection
by: Wang, Yizhi, et al.
Published: (2025)
by: Wang, Yizhi, et al.
Published: (2025)
Optimizing Chain-of-Thought Confidence via Topological and Dirichlet Risk Analysis
by: More, Abhishek, et al.
Published: (2025)
by: More, Abhishek, et al.
Published: (2025)
Reinforcement Learning for Chain of Thought Compression with One-Domain-to-All Generalization
by: Li, Hanyu, et al.
Published: (2025)
by: Li, Hanyu, et al.
Published: (2025)
NGRPO: Negative-enhanced Group Relative Policy Optimization
by: Nan, Gongrui, et al.
Published: (2025)
by: Nan, Gongrui, et al.
Published: (2025)
PA3: Policy-Aware Agent Alignment through Chain-of-Thought
by: Dipta, Shubhashis Roy, et al.
Published: (2026)
by: Dipta, Shubhashis Roy, et al.
Published: (2026)
Scalable Chain of Thoughts via Elastic Reasoning
by: Xu, Yuhui, et al.
Published: (2025)
by: Xu, Yuhui, et al.
Published: (2025)
Enhancing Generalization in Chain of Thought Reasoning for Smaller Models
by: Yin, Maxwell J., et al.
Published: (2025)
by: Yin, Maxwell J., et al.
Published: (2025)
Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning
by: Li, Xuan, et al.
Published: (2026)
by: Li, Xuan, et al.
Published: (2026)
Ratio-Variance Regularized Policy Optimization for Efficient LLM Fine-tuning
by: Luo, Yu, et al.
Published: (2026)
by: Luo, Yu, et al.
Published: (2026)
Group-in-Group Policy Optimization for LLM Agent Training
by: Feng, Lang, et al.
Published: (2025)
by: Feng, Lang, et al.
Published: (2025)
Linear Chain Transformation: Expanding Optimization Dynamics for Fine-Tuning Large Language Models
by: Wang, Yulong, et al.
Published: (2024)
by: Wang, Yulong, et al.
Published: (2024)
Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning
by: Wang, Libo
Published: (2025)
by: Wang, Libo
Published: (2025)
Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing
by: Li, Gengsheng, et al.
Published: (2026)
by: Li, Gengsheng, et al.
Published: (2026)
Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models
by: Ye, Jiacheng, et al.
Published: (2024)
by: Ye, Jiacheng, et al.
Published: (2024)
Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO
by: Yu, Bowen, et al.
Published: (2026)
by: Yu, Bowen, et al.
Published: (2026)
PPC-GPT: Federated Task-Specific Compression of Large Language Models via Pruning and Chain-of-Thought Distillation
by: Fan, Tao, et al.
Published: (2025)
by: Fan, Tao, et al.
Published: (2025)
Output Scaling: YingLong-Delayed Chain of Thought in a Large Pretrained Time Series Forecasting Model
by: Wang, Xue, et al.
Published: (2025)
by: Wang, Xue, et al.
Published: (2025)
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
by: Yao, Jiarui, et al.
Published: (2025)
by: Yao, Jiarui, et al.
Published: (2025)
Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning
by: Li, Miao, et al.
Published: (2026)
by: Li, Miao, et al.
Published: (2026)
AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback
by: Hu, Miaobo, et al.
Published: (2026)
by: Hu, Miaobo, et al.
Published: (2026)
Ehrenfeucht-Haussler Rank and Chain of Thought
by: Barceló, Pablo, et al.
Published: (2025)
by: Barceló, Pablo, et al.
Published: (2025)
Improving Chain-of-Thought for Logical Reasoning via Attention-Aware Intervention
by: Phuong, Nguyen Minh, et al.
Published: (2026)
by: Phuong, Nguyen Minh, et al.
Published: (2026)
In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought
by: Huang, Sili, et al.
Published: (2024)
by: Huang, Sili, et al.
Published: (2024)
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
by: Liu, Shih-Yang, et al.
Published: (2026)
by: Liu, Shih-Yang, et al.
Published: (2026)
Chain-of-Thought Predictive Control
by: Jia, Zhiwei, et al.
Published: (2023)
by: Jia, Zhiwei, et al.
Published: (2023)
Similar Items
-
Automatic Constraint Policy Optimization based on Continuous Constraint Interpolation Framework for Offline Reinforcement Learning
by: Han, Xinchen, et al.
Published: (2026) -
PIQL: Projective Implicit Q-Learning with Support Constraint for Offline Reinforcement Learning
by: Han, Xinchen, et al.
Published: (2025) -
Activation Steering for Chain-of-Thought Compression
by: Azizi, Seyedarmin, et al.
Published: (2025) -
Long-Document QA with Chain-of-Structured-Thought and Fine-Tuned SLMs
by: Liang, Zhuowen, et al.
Published: (2026) -
Long Chain-of-Thought Reasoning Across Languages
by: Barua, Josh, et al.
Published: (2025)