Saved in:
| Main Authors: | Zhou, Shuyi, Song, Zeen, Qiang, Wenwen, Sun, Jiyan, Zhou, Yao, Liu, Yinlong, Ma, Wei |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.02675 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Causal Reward Adjustment: Mitigating Reward Hacking in External Reasoning via Backdoor Correction
by: Song, Ruike, et al.
Published: (2025)
by: Song, Ruike, et al.
Published: (2025)
Group Causal Policy Optimization for Post-Training Large Language Models
by: Gu, Ziyin, et al.
Published: (2025)
by: Gu, Ziyin, et al.
Published: (2025)
Adaptive Uncertainty-Aware Tree Search for Robust Reasoning
by: Song, Zeen, et al.
Published: (2026)
by: Song, Zeen, et al.
Published: (2026)
Beyond All-to-All: Causal-Aligned Transformer with Dynamic Structure Learning for Multivariate Time Series Forecasting
by: Zhang, Xingyu, et al.
Published: (2025)
by: Zhang, Xingyu, et al.
Published: (2025)
On the Generalization and Causal Explanation in Self-Supervised Learning
by: Qiang, Wenwen, et al.
Published: (2024)
by: Qiang, Wenwen, et al.
Published: (2024)
Causal Front-Door Adjustment for Robust Jailbreak Attacks on LLMs
by: Zhou, Yao, et al.
Published: (2026)
by: Zhou, Yao, et al.
Published: (2026)
On the Out-of-Distribution Generalization of Self-Supervised Learning
by: Qiang, Wenwen, et al.
Published: (2025)
by: Qiang, Wenwen, et al.
Published: (2025)
Reward Model Generalization for Compute-Aware Test-Time Reasoning
by: Song, Zeen, et al.
Published: (2025)
by: Song, Zeen, et al.
Published: (2025)
Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs
by: Wang, Jingyao, et al.
Published: (2025)
by: Wang, Jingyao, et al.
Published: (2025)
Closing the Loop: A Control-Theoretic Framework for Provably Stable Time Series Forecasting with LLMs
by: Zhang, Xingyu, et al.
Published: (2026)
by: Zhang, Xingyu, et al.
Published: (2026)
Hacking Task Confounder in Meta-Learning
by: Wang, Jingyao, et al.
Published: (2023)
by: Wang, Jingyao, et al.
Published: (2023)
Not All Frequencies Are Created Equal:Towards a Dynamic Fusion of Frequencies in Time-Series Forecasting
by: Zhang, Xingyu, et al.
Published: (2024)
by: Zhang, Xingyu, et al.
Published: (2024)
EP-GRPO: Entropy-Progress Aligned Group Relative Policy Optimization with Implicit Process Guidance
by: Yu, Song, et al.
Published: (2026)
by: Yu, Song, et al.
Published: (2026)
LithoGRPO: Fast Inverse Lithography via GRPO Reinforced Flow Matching
by: Lai, Yao, et al.
Published: (2026)
by: Lai, Yao, et al.
Published: (2026)
Towards Generalizable Reasoning: Group Causal Counterfactual Policy Optimization for LLM Reasoning
by: Wang, Jingyao, et al.
Published: (2026)
by: Wang, Jingyao, et al.
Published: (2026)
From Intent to Evidence: A Categorical Approach for Structural Evaluation of Deep Research Agents
by: Liu, Shuoling, et al.
Published: (2026)
by: Liu, Shuoling, et al.
Published: (2026)
dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models
by: Wan, Zhengyan, et al.
Published: (2026)
by: Wan, Zhengyan, et al.
Published: (2026)
Towards the Causal Complete Cause of Multi-Modal Representation Learning
by: Wang, Jingyao, et al.
Published: (2024)
by: Wang, Jingyao, et al.
Published: (2024)
When Missing Becomes Structure: Intent-Preserving Policy Completion from Financial KOL Discourse
by: Liu, Yuncong, et al.
Published: (2026)
by: Liu, Yuncong, et al.
Published: (2026)
Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective
by: Zhang, Yanan, et al.
Published: (2024)
by: Zhang, Yanan, et al.
Published: (2024)
Tagged for Direction: Pinning Down Causal Edge Directions with Precision
by: Busch, Florian Peter, et al.
Published: (2025)
by: Busch, Florian Peter, et al.
Published: (2025)
Pin-Tuning: Parameter-Efficient In-Context Tuning for Few-Shot Molecular Property Prediction
by: Wang, Liang, et al.
Published: (2024)
by: Wang, Liang, et al.
Published: (2024)
Event-CausNet: Unlocking Causal Knowledge from Text with Large Language Models for Reliable Spatio-Temporal Forecasting
by: Niu, Luyao, et al.
Published: (2025)
by: Niu, Luyao, et al.
Published: (2025)
Learning Polyhedral Conformal Sets for Robust Optimization
by: Chen, Shuyi, et al.
Published: (2026)
by: Chen, Shuyi, et al.
Published: (2026)
Drug Synergy Prediction via Residual Graph Isomorphism Networks and Attention Mechanisms
by: Song, Jiyan, et al.
Published: (2026)
by: Song, Jiyan, et al.
Published: (2026)
How Off-Policy Can GRPO Be? Mu-GRPO for Efficient LLM Reinforcement Learning
by: Tian, Minghao, et al.
Published: (2026)
by: Tian, Minghao, et al.
Published: (2026)
Make Deep Networks Shallow Again
by: Bermeitinger, Bernhard, et al.
Published: (2023)
by: Bermeitinger, Bernhard, et al.
Published: (2023)
Deep Minds and Shallow Probes
by: Lee, Su Hyeong, et al.
Published: (2026)
by: Lee, Su Hyeong, et al.
Published: (2026)
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization
by: Zheng, Zhi, et al.
Published: (2025)
by: Zheng, Zhi, et al.
Published: (2025)
Prepare Before You Act: Learning From Humans to Rearrange Initial States
by: Dai, Yinlong, et al.
Published: (2025)
by: Dai, Yinlong, et al.
Published: (2025)
Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting
by: Wang, Cheng, et al.
Published: (2026)
by: Wang, Cheng, et al.
Published: (2026)
AMIR-GRPO: Inducing Implicit Preference Signals into GRPO
by: Yari, Amir Hossein, et al.
Published: (2026)
by: Yari, Amir Hossein, et al.
Published: (2026)
Lean Finder: Semantic Search for Mathlib That Understands User Intents
by: Lu, Jialin, et al.
Published: (2025)
by: Lu, Jialin, et al.
Published: (2025)
A Survey of Deep Causal Models and Their Industrial Applications
by: Li, Zongyu, et al.
Published: (2022)
by: Li, Zongyu, et al.
Published: (2022)
A Generative Framework for Causal Estimation via Importance-Weighted Diffusion Distillation
by: Song, Xinran, et al.
Published: (2025)
by: Song, Xinran, et al.
Published: (2025)
Reducing Action Space for Deep Reinforcement Learning via Causal Effect Estimation
by: Liu, Wenzhang, et al.
Published: (2025)
by: Liu, Wenzhang, et al.
Published: (2025)
GRPO-RM: Fine-Tuning Representation Models via GRPO-Driven Reinforcement Learning
by: Xu, Yanchen, et al.
Published: (2025)
by: Xu, Yanchen, et al.
Published: (2025)
Efficient Causal Structure Learning via Modular Subgraph Integration
by: Sun, Haixiang, et al.
Published: (2026)
by: Sun, Haixiang, et al.
Published: (2026)
MMR-GRPO: Accelerating GRPO-Style Training through Diversity-Aware Reward Reweighting
by: Wei, Kangda, et al.
Published: (2026)
by: Wei, Kangda, et al.
Published: (2026)
Advances in GRPO for Generation Models: A Survey
by: Liu, Zexiang, et al.
Published: (2026)
by: Liu, Zexiang, et al.
Published: (2026)
Similar Items
-
Causal Reward Adjustment: Mitigating Reward Hacking in External Reasoning via Backdoor Correction
by: Song, Ruike, et al.
Published: (2025) -
Group Causal Policy Optimization for Post-Training Large Language Models
by: Gu, Ziyin, et al.
Published: (2025) -
Adaptive Uncertainty-Aware Tree Search for Robust Reasoning
by: Song, Zeen, et al.
Published: (2026) -
Beyond All-to-All: Causal-Aligned Transformer with Dynamic Structure Learning for Multivariate Time Series Forecasting
by: Zhang, Xingyu, et al.
Published: (2025) -
On the Generalization and Causal Explanation in Self-Supervised Learning
by: Qiang, Wenwen, et al.
Published: (2024)