Saved in:
| Main Author: | Nguyen, Khanh |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2305.17760 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Understanding Understanding: A Pragmatic Framework Motivated by Large Language Models
by: Leyton-Brown, Kevin, et al.
Published: (2024)
by: Leyton-Brown, Kevin, et al.
Published: (2024)
RLHF Workflow: From Reward Modeling to Online RLHF
by: Dong, Hanze, et al.
Published: (2024)
by: Dong, Hanze, et al.
Published: (2024)
Language-Guided World Models: A Model-Based Approach to AI Control
by: Zhang, Alex, et al.
Published: (2024)
by: Zhang, Alex, et al.
Published: (2024)
Automatic Prompt Selection for Large Language Models
by: Do, Viet-Tung, et al.
Published: (2024)
by: Do, Viet-Tung, et al.
Published: (2024)
Understanding the Effects of RLHF on LLM Generalisation and Diversity
by: Kirk, Robert, et al.
Published: (2023)
by: Kirk, Robert, et al.
Published: (2023)
Optimizing RLHF Training for Large Language Models with Stage Fusion
by: Zhong, Yinmin, et al.
Published: (2024)
by: Zhong, Yinmin, et al.
Published: (2024)
Reward Model Overoptimisation in Iterated RLHF
by: Wolf, Lorenz, et al.
Published: (2025)
by: Wolf, Lorenz, et al.
Published: (2025)
How to Evaluate Reward Models for RLHF
by: Frick, Evan, et al.
Published: (2024)
by: Frick, Evan, et al.
Published: (2024)
Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
by: Shi, Ruizhe, et al.
Published: (2025)
by: Shi, Ruizhe, et al.
Published: (2025)
Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxy
by: Zhu, Yu, et al.
Published: (2024)
by: Zhu, Yu, et al.
Published: (2024)
Quantile Regression for Distributional Reward Models in RLHF
by: Dorka, Nicolai
Published: (2024)
by: Dorka, Nicolai
Published: (2024)
Derailing Non-Answers via Logit Suppression at Output Subspace Boundaries in RLHF-Aligned Language Models
by: Dam, Harvey, et al.
Published: (2025)
by: Dam, Harvey, et al.
Published: (2025)
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
by: Noukhovitch, Michael, et al.
Published: (2024)
by: Noukhovitch, Michael, et al.
Published: (2024)
Does RLHF Scale? Exploring the Impacts From Data, Model, and Method
by: Hou, Zhenyu, et al.
Published: (2024)
by: Hou, Zhenyu, et al.
Published: (2024)
Understanding Emergent Abilities of Language Models from the Loss Perspective
by: Du, Zhengxiao, et al.
Published: (2024)
by: Du, Zhengxiao, et al.
Published: (2024)
Reward Generalization in RLHF: A Topological Perspective
by: Qiu, Tianyi, et al.
Published: (2024)
by: Qiu, Tianyi, et al.
Published: (2024)
DocMIA: Document-Level Membership Inference Attacks against DocVQA Models
by: Nguyen, Khanh, et al.
Published: (2025)
by: Nguyen, Khanh, et al.
Published: (2025)
Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution
by: Xu, Nuo, et al.
Published: (2024)
by: Xu, Nuo, et al.
Published: (2024)
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
by: Lu, Taiming, et al.
Published: (2024)
by: Lu, Taiming, et al.
Published: (2024)
Failure Modes of Maximum Entropy RLHF
by: Çağatan, Ömer Veysel, et al.
Published: (2025)
by: Çağatan, Ömer Veysel, et al.
Published: (2025)
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
by: Hu, Jian, et al.
Published: (2024)
by: Hu, Jian, et al.
Published: (2024)
ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation
by: Mei, Zhiyu, et al.
Published: (2024)
by: Mei, Zhiyu, et al.
Published: (2024)
Evaluating Defences against Unsafe Feedback in RLHF
by: Rosati, Domenic, et al.
Published: (2024)
by: Rosati, Domenic, et al.
Published: (2024)
Solving the Inverse Alignment Problem for Efficient RLHF
by: Krishna, Shambhavi, et al.
Published: (2024)
by: Krishna, Shambhavi, et al.
Published: (2024)
CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User Feedbacks
by: Wang, Hao, et al.
Published: (2026)
by: Wang, Hao, et al.
Published: (2026)
Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation
by: Wang, Xinyi, et al.
Published: (2024)
by: Wang, Xinyi, et al.
Published: (2024)
Bayesian Mixture of Experts For Large Language Models
by: Dialameh, Maryam, et al.
Published: (2025)
by: Dialameh, Maryam, et al.
Published: (2025)
RLHF and IIA: Perverse Incentives
by: Xu, Wanqiao, et al.
Published: (2023)
by: Xu, Wanqiao, et al.
Published: (2023)
Reward-Robust RLHF in LLMs
by: Yan, Yuzi, et al.
Published: (2024)
by: Yan, Yuzi, et al.
Published: (2024)
Federated Document Visual Question Answering: A Pilot Study
by: Nguyen, Khanh, et al.
Published: (2024)
by: Nguyen, Khanh, et al.
Published: (2024)
Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization
by: Nath, Swaroop, et al.
Published: (2024)
by: Nath, Swaroop, et al.
Published: (2024)
Group Robust Preference Optimization in Reward-free RLHF
by: Ramesh, Shyam Sundhar, et al.
Published: (2024)
by: Ramesh, Shyam Sundhar, et al.
Published: (2024)
Why Is RLHF Alignment Shallow? A Gradient Analysis
by: Young, Robin
Published: (2026)
by: Young, Robin
Published: (2026)
Deep Bayesian Active Learning for Preference Modeling in Large Language Models
by: Melo, Luckeciano C., et al.
Published: (2024)
by: Melo, Luckeciano C., et al.
Published: (2024)
Can Authorship Attribution Models Distinguish Speakers in Speech Transcripts?
by: Aggazzotti, Cristina, et al.
Published: (2023)
by: Aggazzotti, Cristina, et al.
Published: (2023)
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective
by: Liu, Qi, et al.
Published: (2025)
by: Liu, Qi, et al.
Published: (2025)
Distributional Surgery for Language Model Activations
by: Nguyen, Bao, et al.
Published: (2025)
by: Nguyen, Bao, et al.
Published: (2025)
Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time
by: Zhang, Zhenyu, et al.
Published: (2025)
by: Zhang, Zhenyu, et al.
Published: (2025)
Understanding the Language Model to Solve the Symbolic Multi-Step Reasoning Problem from the Perspective of Buffer Mechanism
by: Wang, Zhiwei, et al.
Published: (2024)
by: Wang, Zhiwei, et al.
Published: (2024)
A Long Way to Go: Investigating Length Correlations in RLHF
by: Singhal, Prasann, et al.
Published: (2023)
by: Singhal, Prasann, et al.
Published: (2023)
Similar Items
-
Understanding Understanding: A Pragmatic Framework Motivated by Large Language Models
by: Leyton-Brown, Kevin, et al.
Published: (2024) -
RLHF Workflow: From Reward Modeling to Online RLHF
by: Dong, Hanze, et al.
Published: (2024) -
Language-Guided World Models: A Model-Based Approach to AI Control
by: Zhang, Alex, et al.
Published: (2024) -
Automatic Prompt Selection for Large Language Models
by: Do, Viet-Tung, et al.
Published: (2024) -
Understanding the Effects of RLHF on LLM Generalisation and Diversity
by: Kirk, Robert, et al.
Published: (2023)