Saved in:
| Main Authors: | Wang, Siqi, Yang, Hailong, Zhu, Junjie, Wang, Xuezhu, Xu, Yufan, Qian, Depei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.04752 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation
by: Cheng, Zelei, et al.
Published: (2024)
by: Cheng, Zelei, et al.
Published: (2024)
SplaXBERT: Leveraging Mixed Precision Training and Context Splitting for Question Answering
by: Yufan, Zhu, et al.
Published: (2024)
by: Yufan, Zhu, et al.
Published: (2024)
Beyond Window-Based Detection: A Graph-Centric Framework for Discrete Log Anomaly Detection
by: Qi, Jiaxing, et al.
Published: (2025)
by: Qi, Jiaxing, et al.
Published: (2025)
AuroRA: Breaking Low-Rank Bottleneck of LoRA with Nonlinear Mapping
by: Dong, Haonan, et al.
Published: (2025)
by: Dong, Haonan, et al.
Published: (2025)
An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training
by: Xiao, Youshao, et al.
Published: (2023)
by: Xiao, Youshao, et al.
Published: (2023)
Breaking the Attention Bottleneck
by: Hilsenbek, Kalle
Published: (2024)
by: Hilsenbek, Kalle
Published: (2024)
Quantum Machine Learning in Log-based Anomaly Detection: Challenges and Opportunities
by: Qi, Jiaxing, et al.
Published: (2024)
by: Qi, Jiaxing, et al.
Published: (2024)
Breaking Symmetry Bottlenecks in GNN Readouts
by: Talhi, Mouad, et al.
Published: (2026)
by: Talhi, Mouad, et al.
Published: (2026)
BigMac: Breaking the Pareto Frontier of Compute and Memory in Multimodal LLM Training
by: Zhang, Zili, et al.
Published: (2026)
by: Zhang, Zili, et al.
Published: (2026)
Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization
by: Dai, Juntao, et al.
Published: (2025)
by: Dai, Juntao, et al.
Published: (2025)
Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models
by: Li, Zongqian, et al.
Published: (2026)
by: Li, Zongqian, et al.
Published: (2026)
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
by: Hu, Jian, et al.
Published: (2024)
by: Hu, Jian, et al.
Published: (2024)
Accelerating RLHF Training with Reward Variance Increase
by: Yang, Zonglin, et al.
Published: (2025)
by: Yang, Zonglin, et al.
Published: (2025)
Adaptive Margin RLHF via Preference over Preferences
by: Chittepu, Yaswanth, et al.
Published: (2025)
by: Chittepu, Yaswanth, et al.
Published: (2025)
Refining the Information Bottleneck via Adversarial Information Separation
by: Ning, Shuai, et al.
Published: (2026)
by: Ning, Shuai, et al.
Published: (2026)
Breaking Memorization Barriers in LLM Code Fine-Tuning via Information Bottleneck for Improved Generalization
by: Wang, Changsheng, et al.
Published: (2025)
by: Wang, Changsheng, et al.
Published: (2025)
Mjolnir: Breaking the Shield of Perturbation-Protected Gradients via Adaptive Diffusion
by: Liu, Xuan, et al.
Published: (2024)
by: Liu, Xuan, et al.
Published: (2024)
Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models
by: Wang, Haoyu, et al.
Published: (2025)
by: Wang, Haoyu, et al.
Published: (2025)
Efficient Federated RLHF via Zeroth-Order Policy Optimization
by: Wang, Deyi, et al.
Published: (2026)
by: Wang, Deyi, et al.
Published: (2026)
Exploring Dynamic Properties of Backdoor Training Through Information Bottleneck
by: Liu, Xinyu, et al.
Published: (2025)
by: Liu, Xinyu, et al.
Published: (2025)
Breaking the Bottlenecks: Scalable Diffusion Models for 3D Molecular Generation
by: Das, Adrita, et al.
Published: (2026)
by: Das, Adrita, et al.
Published: (2026)
Provably Efficient Online RLHF with One-Pass Reward Modeling
by: Li, Long-Fei, et al.
Published: (2025)
by: Li, Long-Fei, et al.
Published: (2025)
Policy Optimization in RLHF: The Impact of Out-of-preference Data
by: Li, Ziniu, et al.
Published: (2023)
by: Li, Ziniu, et al.
Published: (2023)
RLHF Workflow: From Reward Modeling to Online RLHF
by: Dong, Hanze, et al.
Published: (2024)
by: Dong, Hanze, et al.
Published: (2024)
Breaking the Context Bottleneck on Long Time Series Forecasting
by: Ma, Chao, et al.
Published: (2024)
by: Ma, Chao, et al.
Published: (2024)
Robust Post-Training for Generative Recommenders: Why Exponential Reward-Weighted SFT Outperforms RLHF
by: Chidambaram, Keertana, et al.
Published: (2026)
by: Chidambaram, Keertana, et al.
Published: (2026)
Mitigating the Alignment Tax of RLHF
by: Lin, Yong, et al.
Published: (2023)
by: Lin, Yong, et al.
Published: (2023)
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
by: Zhou, Yang, et al.
Published: (2025)
by: Zhou, Yang, et al.
Published: (2025)
Draft, Verify, and Improve: Toward Training-Aware Speculative Decoding
by: Bhansali, Shrenik, et al.
Published: (2025)
by: Bhansali, Shrenik, et al.
Published: (2025)
Breaking AR's Sampling Bottleneck: Provable Acceleration via Diffusion Language Models
by: Li, Gen, et al.
Published: (2025)
by: Li, Gen, et al.
Published: (2025)
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
by: Xiao, Da, et al.
Published: (2025)
by: Xiao, Da, et al.
Published: (2025)
Factored Causal Representation Learning for Robust Reward Modeling in RLHF
by: Yang, Yupei, et al.
Published: (2026)
by: Yang, Yupei, et al.
Published: (2026)
Towards a Theoretical Understanding to the Generalization of RLHF
by: Li, Zhaochun, et al.
Published: (2026)
by: Li, Zhaochun, et al.
Published: (2026)
Optimizing RLHF Training for Large Language Models with Stage Fusion
by: Zhong, Yinmin, et al.
Published: (2024)
by: Zhong, Yinmin, et al.
Published: (2024)
Understanding Sampler Stochasticity in Training Diffusion Models for RLHF
by: Sheng, Jiayuan, et al.
Published: (2025)
by: Sheng, Jiayuan, et al.
Published: (2025)
P-EAGLE: Parallel-Drafting EAGLE with Scalable Training
by: Hui, Mude, et al.
Published: (2026)
by: Hui, Mude, et al.
Published: (2026)
Breaking the Simplification Bottleneck in Amortized Neural Symbolic Regression
by: Saegert, Paul, et al.
Published: (2026)
by: Saegert, Paul, et al.
Published: (2026)
SharedRep-RLHF: A Shared Representation Approach to RLHF with Diverse Preferences
by: Mukherjee, Arpan, et al.
Published: (2025)
by: Mukherjee, Arpan, et al.
Published: (2025)
Unifying Stable Optimization and Reference Regularization in RLHF
by: He, Li, et al.
Published: (2026)
by: He, Li, et al.
Published: (2026)
Learning a Pessimistic Reward Model in RLHF
by: Xu, Yinglun, et al.
Published: (2025)
by: Xu, Yinglun, et al.
Published: (2025)
Similar Items
-
RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation
by: Cheng, Zelei, et al.
Published: (2024) -
SplaXBERT: Leveraging Mixed Precision Training and Context Splitting for Question Answering
by: Yufan, Zhu, et al.
Published: (2024) -
Beyond Window-Based Detection: A Graph-Centric Framework for Discrete Log Anomaly Detection
by: Qi, Jiaxing, et al.
Published: (2025) -
AuroRA: Breaking Low-Rank Bottleneck of LoRA with Nonlinear Mapping
by: Dong, Haonan, et al.
Published: (2025) -
An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training
by: Xiao, Youshao, et al.
Published: (2023)