Saved in:
| Main Authors: | Sun, Wangtao, Cheng, Xiang, Yu, Xing, Xu, Haotian, Yang, Zhao, He, Shizhu, Zhao, Jun, Liu, Kang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.22480 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Shuttle Between the Instructions and the Parameters of Large Language Models
by: Sun, Wangtao, et al.
Published: (2025)
by: Sun, Wangtao, et al.
Published: (2025)
From Chain to Tree: Refining Chain-like Rules into Tree-like Rules on Knowledge Graphs
by: Sun, Wangtao, et al.
Published: (2024)
by: Sun, Wangtao, et al.
Published: (2024)
ItD: Large Language Models Can Teach Themselves Induction through Deduction
by: Sun, Wangtao, et al.
Published: (2024)
by: Sun, Wangtao, et al.
Published: (2024)
ExpNote: Black-box Large Language Models are Better Task Solvers with Experience Notebook
by: Sun, Wangtao, et al.
Published: (2023)
by: Sun, Wangtao, et al.
Published: (2023)
Towards Agentic Self-Learning LLMs in Search Environment
by: Sun, Wangtao, et al.
Published: (2025)
by: Sun, Wangtao, et al.
Published: (2025)
Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention
by: Liao, Huanxuan, et al.
Published: (2025)
by: Liao, Huanxuan, et al.
Published: (2025)
DATA: Decomposed Attention-based Task Adaptation for Rehearsal-Free Continual Learning
by: Liao, Huanxuan, et al.
Published: (2025)
by: Liao, Huanxuan, et al.
Published: (2025)
Ask a Strong LLM Judge when Your Reward Model is Uncertain
by: Xu, Zhenghao, et al.
Published: (2025)
by: Xu, Zhenghao, et al.
Published: (2025)
LLaSA: Large Language and Structured Data Assistant
by: Xu, Yao, et al.
Published: (2024)
by: Xu, Yao, et al.
Published: (2024)
Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models
by: Sun, Wangtao, et al.
Published: (2024)
by: Sun, Wangtao, et al.
Published: (2024)
From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space
by: Tan, Yuqiao, et al.
Published: (2026)
by: Tan, Yuqiao, et al.
Published: (2026)
SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning
by: Liao, Huanxuan, et al.
Published: (2025)
by: Liao, Huanxuan, et al.
Published: (2025)
A Probabilistic Perspective on Model Collapse
by: Xu, Shirong, et al.
Published: (2025)
by: Xu, Shirong, et al.
Published: (2025)
Probabilistic Federated Learning on Uncertain and Heterogeneous Data with Model Personalization
by: Rahman, Ratun, et al.
Published: (2026)
by: Rahman, Ratun, et al.
Published: (2026)
CycloneMAE: A Scalable Multi-Task Learning Model for Global Tropical Cyclone Probabilistic Forecasting
by: Hang, Renlong, et al.
Published: (2026)
by: Hang, Renlong, et al.
Published: (2026)
Learning Personalized Ad Impact via Contextual Reinforcement Learning under Delayed Rewards
by: Cheng, Yuwei, et al.
Published: (2025)
by: Cheng, Yuwei, et al.
Published: (2025)
Contractive Diffusion Probabilistic Models
by: Tang, Wenpin, et al.
Published: (2024)
by: Tang, Wenpin, et al.
Published: (2024)
Provably Efficient Online RLHF with One-Pass Reward Modeling
by: Li, Long-Fei, et al.
Published: (2025)
by: Li, Long-Fei, et al.
Published: (2025)
The Power of the Pareto Front: Balancing Uncertain Rewards for Adaptive Experimentation in scanning probe microscopy
by: Liu, Yu, et al.
Published: (2025)
by: Liu, Yu, et al.
Published: (2025)
GDRO: Group-level Reward Post-training Suitable for Diffusion Models
by: Wang, Yiyang, et al.
Published: (2026)
by: Wang, Yiyang, et al.
Published: (2026)
Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance
by: Li, Zhuo, et al.
Published: (2025)
by: Li, Zhuo, et al.
Published: (2025)
Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning
by: Mo, Shentong
Published: (2026)
by: Mo, Shentong
Published: (2026)
Semi-Supervised Reward Modeling via Iterative Self-Training
by: He, Yifei, et al.
Published: (2024)
by: He, Yifei, et al.
Published: (2024)
Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward
by: Jia, Zhiwei, et al.
Published: (2024)
by: Jia, Zhiwei, et al.
Published: (2024)
Data-driven Probabilistic Trajectory Learning with High Temporal Resolution in Terminal Airspace
by: Xiang, Jun, et al.
Published: (2024)
by: Xiang, Jun, et al.
Published: (2024)
Amplify Adjacent Token Differences: Enhancing Long Chain-of-Thought Reasoning with Shift-FFN
by: Xu, Yao, et al.
Published: (2025)
by: Xu, Yao, et al.
Published: (2025)
Human Cognition Inspired RAG with Knowledge Graph for Complex Problem Solving
by: Cheng, Yao, et al.
Published: (2025)
by: Cheng, Yao, et al.
Published: (2025)
Adversarial Training of Reward Models
by: Bukharin, Alexander, et al.
Published: (2025)
by: Bukharin, Alexander, et al.
Published: (2025)
Text2Reward: Reward Shaping with Language Models for Reinforcement Learning
by: Xie, Tianbao, et al.
Published: (2023)
by: Xie, Tianbao, et al.
Published: (2023)
PRCL: Probabilistic Representation Contrastive Learning for Semi-Supervised Semantic Segmentation
by: Xie, Haoyu, et al.
Published: (2024)
by: Xie, Haoyu, et al.
Published: (2024)
Boosting Graph Foundation Model from Structural Perspective
by: Cheng, Yao, et al.
Published: (2024)
by: Cheng, Yao, et al.
Published: (2024)
Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for Reinforcement Learning
by: Tang, Yuting, et al.
Published: (2024)
by: Tang, Yuting, et al.
Published: (2024)
Learning a Pessimistic Reward Model in RLHF
by: Xu, Yinglun, et al.
Published: (2025)
by: Xu, Yinglun, et al.
Published: (2025)
DavIR: Data Selection via Implicit Reward for Large Language Models
by: Zhou, Haotian, et al.
Published: (2023)
by: Zhou, Haotian, et al.
Published: (2023)
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
by: Ji, Shengpeng, et al.
Published: (2025)
by: Ji, Shengpeng, et al.
Published: (2025)
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
by: Shen, Guobin, et al.
Published: (2026)
by: Shen, Guobin, et al.
Published: (2026)
Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns
by: Li, Xiang, et al.
Published: (2025)
by: Li, Xiang, et al.
Published: (2025)
Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization
by: Ma, Qiyao, et al.
Published: (2026)
by: Ma, Qiyao, et al.
Published: (2026)
Probabilistic Rank and Reward: A Scalable Model for Slate Recommendation
by: Aouali, Imad, et al.
Published: (2022)
by: Aouali, Imad, et al.
Published: (2022)
Learning to Reason without External Rewards
by: Zhao, Xuandong, et al.
Published: (2025)
by: Zhao, Xuandong, et al.
Published: (2025)
Similar Items
-
Shuttle Between the Instructions and the Parameters of Large Language Models
by: Sun, Wangtao, et al.
Published: (2025) -
From Chain to Tree: Refining Chain-like Rules into Tree-like Rules on Knowledge Graphs
by: Sun, Wangtao, et al.
Published: (2024) -
ItD: Large Language Models Can Teach Themselves Induction through Deduction
by: Sun, Wangtao, et al.
Published: (2024) -
ExpNote: Black-box Large Language Models are Better Task Solvers with Experience Notebook
by: Sun, Wangtao, et al.
Published: (2023) -
Towards Agentic Self-Learning LLMs in Search Environment
by: Sun, Wangtao, et al.
Published: (2025)