:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Siqi, Yang, Hailong, Zhu, Junjie, Wang, Xuezhu, Xu, Yufan, Qian, Depei
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2512.04752
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation
by: Cheng, Zelei, et al.
Published: (2024)

SplaXBERT: Leveraging Mixed Precision Training and Context Splitting for Question Answering
by: Yufan, Zhu, et al.
Published: (2024)

Beyond Window-Based Detection: A Graph-Centric Framework for Discrete Log Anomaly Detection
by: Qi, Jiaxing, et al.
Published: (2025)

AuroRA: Breaking Low-Rank Bottleneck of LoRA with Nonlinear Mapping
by: Dong, Haonan, et al.
Published: (2025)

An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training
by: Xiao, Youshao, et al.
Published: (2023)

Breaking the Attention Bottleneck
by: Hilsenbek, Kalle
Published: (2024)

Quantum Machine Learning in Log-based Anomaly Detection: Challenges and Opportunities
by: Qi, Jiaxing, et al.
Published: (2024)

Breaking Symmetry Bottlenecks in GNN Readouts
by: Talhi, Mouad, et al.
Published: (2026)

BigMac: Breaking the Pareto Frontier of Compute and Memory in Multimodal LLM Training
by: Zhang, Zili, et al.
Published: (2026)

Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization
by: Dai, Juntao, et al.
Published: (2025)

Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models
by: Li, Zongqian, et al.
Published: (2026)

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
by: Hu, Jian, et al.
Published: (2024)

Accelerating RLHF Training with Reward Variance Increase
by: Yang, Zonglin, et al.
Published: (2025)

Adaptive Margin RLHF via Preference over Preferences
by: Chittepu, Yaswanth, et al.
Published: (2025)

Refining the Information Bottleneck via Adversarial Information Separation
by: Ning, Shuai, et al.
Published: (2026)

Breaking Memorization Barriers in LLM Code Fine-Tuning via Information Bottleneck for Improved Generalization
by: Wang, Changsheng, et al.
Published: (2025)

Mjolnir: Breaking the Shield of Perturbation-Protected Gradients via Adaptive Diffusion
by: Liu, Xuan, et al.
Published: (2024)

Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models
by: Wang, Haoyu, et al.
Published: (2025)

Efficient Federated RLHF via Zeroth-Order Policy Optimization
by: Wang, Deyi, et al.
Published: (2026)

Exploring Dynamic Properties of Backdoor Training Through Information Bottleneck
by: Liu, Xinyu, et al.
Published: (2025)

Breaking the Bottlenecks: Scalable Diffusion Models for 3D Molecular Generation
by: Das, Adrita, et al.
Published: (2026)

Provably Efficient Online RLHF with One-Pass Reward Modeling
by: Li, Long-Fei, et al.
Published: (2025)

Policy Optimization in RLHF: The Impact of Out-of-preference Data
by: Li, Ziniu, et al.
Published: (2023)

RLHF Workflow: From Reward Modeling to Online RLHF
by: Dong, Hanze, et al.
Published: (2024)

Breaking the Context Bottleneck on Long Time Series Forecasting
by: Ma, Chao, et al.
Published: (2024)

Robust Post-Training for Generative Recommenders: Why Exponential Reward-Weighted SFT Outperforms RLHF
by: Chidambaram, Keertana, et al.
Published: (2026)

Mitigating the Alignment Tax of RLHF
by: Lin, Yong, et al.
Published: (2023)

Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning
by: Zhou, Yang, et al.
Published: (2025)

Draft, Verify, and Improve: Toward Training-Aware Speculative Decoding
by: Bhansali, Shrenik, et al.
Published: (2025)

Breaking AR's Sampling Bottleneck: Provable Acceleration via Diffusion Language Models
by: Li, Gen, et al.
Published: (2025)

MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
by: Xiao, Da, et al.
Published: (2025)

Factored Causal Representation Learning for Robust Reward Modeling in RLHF
by: Yang, Yupei, et al.
Published: (2026)

Towards a Theoretical Understanding to the Generalization of RLHF
by: Li, Zhaochun, et al.
Published: (2026)

Optimizing RLHF Training for Large Language Models with Stage Fusion
by: Zhong, Yinmin, et al.
Published: (2024)

Understanding Sampler Stochasticity in Training Diffusion Models for RLHF
by: Sheng, Jiayuan, et al.
Published: (2025)

P-EAGLE: Parallel-Drafting EAGLE with Scalable Training
by: Hui, Mude, et al.
Published: (2026)

Breaking the Simplification Bottleneck in Amortized Neural Symbolic Regression
by: Saegert, Paul, et al.
Published: (2026)

SharedRep-RLHF: A Shared Representation Approach to RLHF with Diverse Preferences
by: Mukherjee, Arpan, et al.
Published: (2025)

Unifying Stable Optimization and Reference Regularization in RLHF
by: He, Li, et al.
Published: (2026)

Learning a Pessimistic Reward Model in RLHF
by: Xu, Yinglun, et al.
Published: (2025)