Saved in:
| Main Authors: | He, Shuo, Feng, Lang, Cheng, Xin, Feng, Lei, An, Bo |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.10609 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks
by: He, Shuo, et al.
Published: (2026)
by: He, Shuo, et al.
Published: (2026)
Is Crowdsourcing Breaking Your Bank? Cost-Effective Fine-Tuning of Pre-trained Language Models with Proximal Policy Optimization
by: Yang, Shuo, et al.
Published: (2024)
by: Yang, Shuo, et al.
Published: (2024)
COPO: Causal-Oriented Policy Optimization for Hallucinations of MLLMs
by: Guo, Peizheng, et al.
Published: (2025)
by: Guo, Peizheng, et al.
Published: (2025)
An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation
by: Zhu, Kun, et al.
Published: (2024)
by: Zhu, Kun, et al.
Published: (2024)
Causally-Enhanced Reinforcement Policy Optimization
by: Wang, Xiangqi, et al.
Published: (2025)
by: Wang, Xiangqi, et al.
Published: (2025)
Unveiling Causal Reasoning in Large Language Models: Reality or Mirage?
by: Chi, Haoang, et al.
Published: (2025)
by: Chi, Haoang, et al.
Published: (2025)
TrustDataFilter:Leveraging Trusted Knowledge Base Data for More Effective Filtering of Unknown Information
by: Zhang, Jinghong, et al.
Published: (2025)
by: Zhang, Jinghong, et al.
Published: (2025)
Causal Discovery and Counterfactual Reasoning to Optimize Persuasive Dialogue Policies
by: Zeng, Donghuo, et al.
Published: (2025)
by: Zeng, Donghuo, et al.
Published: (2025)
StableMask: Refining Causal Masking in Decoder-only Transformer
by: Yin, Qingyu, et al.
Published: (2024)
by: Yin, Qingyu, et al.
Published: (2024)
Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models
by: Nguyen, Vy, et al.
Published: (2025)
by: Nguyen, Vy, et al.
Published: (2025)
Optimizing Instruction Synthesis: Effective Exploration of Evolutionary Space with Tree Search
by: Li, Chenglin, et al.
Published: (2024)
by: Li, Chenglin, et al.
Published: (2024)
Beyond Uniform Credit: Causal Credit Assignment for Policy Optimization
by: Khandoga, Mykola, et al.
Published: (2026)
by: Khandoga, Mykola, et al.
Published: (2026)
Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning
by: Bae, Sanghwan, et al.
Published: (2025)
by: Bae, Sanghwan, et al.
Published: (2025)
Filter-then-Weight: Online Data Selection and Reweighting for LLM Fine-Tuning
by: Wang, Fangxin, et al.
Published: (2026)
by: Wang, Fangxin, et al.
Published: (2026)
Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering
by: Si, Shuzheng, et al.
Published: (2025)
by: Si, Shuzheng, et al.
Published: (2025)
AT$^2$PO: Agentic Turn-based Policy Optimization via Tree Search
by: Zong, Zefang, et al.
Published: (2026)
by: Zong, Zefang, et al.
Published: (2026)
VEPO: Variable Entropy Policy Optimization for Low-Resource Language Foundation Models
by: Liu, Chonghan, et al.
Published: (2026)
by: Liu, Chonghan, et al.
Published: (2026)
Nuance Matters: Probing Epistemic Consistency in Causal Reasoning
by: Cui, Shaobo, et al.
Published: (2024)
by: Cui, Shaobo, et al.
Published: (2024)
Fibration Policy Optimization
by: Li, Chang, et al.
Published: (2026)
by: Li, Chang, et al.
Published: (2026)
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
by: Yao, Weiran, et al.
Published: (2023)
by: Yao, Weiran, et al.
Published: (2023)
CauScientist: Teaching LLMs to Respect Data for Causal Discovery
by: Peng, Bo, et al.
Published: (2026)
by: Peng, Bo, et al.
Published: (2026)
keqing: knowledge-based question answering is a nature chain-of-thought mentor of LLM
by: Wang, Chaojie, et al.
Published: (2023)
by: Wang, Chaojie, et al.
Published: (2023)
Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems
by: Feng, Lang, et al.
Published: (2026)
by: Feng, Lang, et al.
Published: (2026)
MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning
by: Wang, Hongjun, et al.
Published: (2026)
by: Wang, Hongjun, et al.
Published: (2026)
Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers
by: Chen, Xin, et al.
Published: (2026)
by: Chen, Xin, et al.
Published: (2026)
CausalStock: Deep End-to-end Causal Discovery for News-driven Stock Movement Prediction
by: Li, Shuqi, et al.
Published: (2024)
by: Li, Shuqi, et al.
Published: (2024)
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System
by: Chen, Weize, et al.
Published: (2024)
by: Chen, Weize, et al.
Published: (2024)
FinS-Pilot: A Benchmark for Online Financial RAG System
by: Wang, Feng, et al.
Published: (2025)
by: Wang, Feng, et al.
Published: (2025)
LLMs Are Prone to Fallacies in Causal Inference
by: Joshi, Nitish, et al.
Published: (2024)
by: Joshi, Nitish, et al.
Published: (2024)
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
by: Song, Feifan, et al.
Published: (2024)
by: Song, Feifan, et al.
Published: (2024)
Evidence-Augmented Policy Optimization with Reward Co-Evolution for Long-Context Reasoning
by: Guan, Xin, et al.
Published: (2026)
by: Guan, Xin, et al.
Published: (2026)
AutoMLGen: Navigating Fine-Grained Optimization for Coding Agents
by: Du, Shangheng, et al.
Published: (2025)
by: Du, Shangheng, et al.
Published: (2025)
SafeSteer: Localized On-Policy Distillation for Efficient Safety Alignment
by: Li, Hao, et al.
Published: (2026)
by: Li, Hao, et al.
Published: (2026)
DCPO: Dynamic Clipping Policy Optimization
by: Yang, Shihui, et al.
Published: (2025)
by: Yang, Shihui, et al.
Published: (2025)
Fine-Tuning Language Models with Reward Learning on Policy
by: Lang, Hao, et al.
Published: (2024)
by: Lang, Hao, et al.
Published: (2024)
Filtered Direct Preference Optimization
by: Morimura, Tetsuro, et al.
Published: (2024)
by: Morimura, Tetsuro, et al.
Published: (2024)
Beyond Linear LLM Invocation: An Efficient and Effective Semantic Filter Paradigm
by: Hou, Nan, et al.
Published: (2026)
by: Hou, Nan, et al.
Published: (2026)
Self-Improvement as Coherence Optimization: A Theoretical Account
by: Qiu, Tianyi, et al.
Published: (2026)
by: Qiu, Tianyi, et al.
Published: (2026)
IGOT: Information Gain Optimized Tokenizer on Domain Adaptive Pretraining
by: Feng, Dawei, et al.
Published: (2024)
by: Feng, Dawei, et al.
Published: (2024)
Similar Items
-
Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks
by: He, Shuo, et al.
Published: (2026) -
Is Crowdsourcing Breaking Your Bank? Cost-Effective Fine-Tuning of Pre-trained Language Models with Proximal Policy Optimization
by: Yang, Shuo, et al.
Published: (2024) -
COPO: Causal-Oriented Policy Optimization for Hallucinations of MLLMs
by: Guo, Peizheng, et al.
Published: (2025) -
An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation
by: Zhu, Kun, et al.
Published: (2024) -
Causally-Enhanced Reinforcement Policy Optimization
by: Wang, Xiangqi, et al.
Published: (2025)