:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhou, Shuyi, Song, Zeen, Qiang, Wenwen, Sun, Jiyan, Zhou, Yao, Liu, Yinlong, Ma, Wei
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2603.02675
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Causal Reward Adjustment: Mitigating Reward Hacking in External Reasoning via Backdoor Correction
by: Song, Ruike, et al.
Published: (2025)

Group Causal Policy Optimization for Post-Training Large Language Models
by: Gu, Ziyin, et al.
Published: (2025)

Adaptive Uncertainty-Aware Tree Search for Robust Reasoning
by: Song, Zeen, et al.
Published: (2026)

Beyond All-to-All: Causal-Aligned Transformer with Dynamic Structure Learning for Multivariate Time Series Forecasting
by: Zhang, Xingyu, et al.
Published: (2025)

On the Generalization and Causal Explanation in Self-Supervised Learning
by: Qiang, Wenwen, et al.
Published: (2024)

Causal Front-Door Adjustment for Robust Jailbreak Attacks on LLMs
by: Zhou, Yao, et al.
Published: (2026)

On the Out-of-Distribution Generalization of Self-Supervised Learning
by: Qiang, Wenwen, et al.
Published: (2025)

Reward Model Generalization for Compute-Aware Test-Time Reasoning
by: Song, Zeen, et al.
Published: (2025)

Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs
by: Wang, Jingyao, et al.
Published: (2025)

Closing the Loop: A Control-Theoretic Framework for Provably Stable Time Series Forecasting with LLMs
by: Zhang, Xingyu, et al.
Published: (2026)

Hacking Task Confounder in Meta-Learning
by: Wang, Jingyao, et al.
Published: (2023)

Not All Frequencies Are Created Equal:Towards a Dynamic Fusion of Frequencies in Time-Series Forecasting
by: Zhang, Xingyu, et al.
Published: (2024)

EP-GRPO: Entropy-Progress Aligned Group Relative Policy Optimization with Implicit Process Guidance
by: Yu, Song, et al.
Published: (2026)

LithoGRPO: Fast Inverse Lithography via GRPO Reinforced Flow Matching
by: Lai, Yao, et al.
Published: (2026)

Towards Generalizable Reasoning: Group Causal Counterfactual Policy Optimization for LLM Reasoning
by: Wang, Jingyao, et al.
Published: (2026)

From Intent to Evidence: A Categorical Approach for Structural Evaluation of Deep Research Agents
by: Liu, Shuoling, et al.
Published: (2026)

dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models
by: Wan, Zhengyan, et al.
Published: (2026)

Towards the Causal Complete Cause of Multi-Modal Representation Learning
by: Wang, Jingyao, et al.
Published: (2024)

When Missing Becomes Structure: Intent-Preserving Policy Completion from Financial KOL Discourse
by: Liu, Yuncong, et al.
Published: (2026)

Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective
by: Zhang, Yanan, et al.
Published: (2024)

Tagged for Direction: Pinning Down Causal Edge Directions with Precision
by: Busch, Florian Peter, et al.
Published: (2025)

Pin-Tuning: Parameter-Efficient In-Context Tuning for Few-Shot Molecular Property Prediction
by: Wang, Liang, et al.
Published: (2024)

Event-CausNet: Unlocking Causal Knowledge from Text with Large Language Models for Reliable Spatio-Temporal Forecasting
by: Niu, Luyao, et al.
Published: (2025)

Learning Polyhedral Conformal Sets for Robust Optimization
by: Chen, Shuyi, et al.
Published: (2026)

Drug Synergy Prediction via Residual Graph Isomorphism Networks and Attention Mechanisms
by: Song, Jiyan, et al.
Published: (2026)

How Off-Policy Can GRPO Be? Mu-GRPO for Efficient LLM Reinforcement Learning
by: Tian, Minghao, et al.
Published: (2026)

Make Deep Networks Shallow Again
by: Bermeitinger, Bernhard, et al.
Published: (2023)

Deep Minds and Shallow Probes
by: Lee, Su Hyeong, et al.
Published: (2026)

SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization
by: Zheng, Zhi, et al.
Published: (2025)

Prepare Before You Act: Learning From Humans to Rearrange Initial States
by: Dai, Yinlong, et al.
Published: (2025)

Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting
by: Wang, Cheng, et al.
Published: (2026)

AMIR-GRPO: Inducing Implicit Preference Signals into GRPO
by: Yari, Amir Hossein, et al.
Published: (2026)

Lean Finder: Semantic Search for Mathlib That Understands User Intents
by: Lu, Jialin, et al.
Published: (2025)

A Survey of Deep Causal Models and Their Industrial Applications
by: Li, Zongyu, et al.
Published: (2022)

A Generative Framework for Causal Estimation via Importance-Weighted Diffusion Distillation
by: Song, Xinran, et al.
Published: (2025)

Reducing Action Space for Deep Reinforcement Learning via Causal Effect Estimation
by: Liu, Wenzhang, et al.
Published: (2025)

GRPO-RM: Fine-Tuning Representation Models via GRPO-Driven Reinforcement Learning
by: Xu, Yanchen, et al.
Published: (2025)

Efficient Causal Structure Learning via Modular Subgraph Integration
by: Sun, Haixiang, et al.
Published: (2026)

MMR-GRPO: Accelerating GRPO-Style Training through Diversity-Aware Reward Reweighting
by: Wei, Kangda, et al.
Published: (2026)

Advances in GRPO for Generation Models: A Survey
by: Liu, Zexiang, et al.
Published: (2026)