Saved in:
| Main Authors: | Shen, Wei, Zhang, Xiaoying, Yao, Yuanshun, Zheng, Rui, Guo, Hongyi, Liu, Yang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.07708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Human-Instruction-Free LLM Self-Alignment with Limited Samples
by: Guo, Hongyi, et al.
Published: (2024)
by: Guo, Hongyi, et al.
Published: (2024)
Toward Optimal LLM Alignments Using Two-Player Games
by: Zheng, Rui, et al.
Published: (2024)
by: Zheng, Rui, et al.
Published: (2024)
Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble
by: Zhang, Shun, et al.
Published: (2024)
by: Zhang, Shun, et al.
Published: (2024)
Measuring and Reducing LLM Hallucination without Gold-Standard Answers
by: Wei, Jiaheng, et al.
Published: (2024)
by: Wei, Jiaheng, et al.
Published: (2024)
Learning Personalized Agents from Human Feedback
by: Liang, Kaiqu, et al.
Published: (2026)
by: Liang, Kaiqu, et al.
Published: (2026)
Large Language Model Unlearning
by: Yao, Yuanshun, et al.
Published: (2023)
by: Yao, Yuanshun, et al.
Published: (2023)
Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model
by: He, Zhiwei, et al.
Published: (2024)
by: He, Zhiwei, et al.
Published: (2024)
ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration
by: Estornell, Andrew, et al.
Published: (2024)
by: Estornell, Andrew, et al.
Published: (2024)
Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
by: Ackermann, Johannes, et al.
Published: (2026)
by: Ackermann, Johannes, et al.
Published: (2026)
Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback
by: Ackermann, Johannes, et al.
Published: (2025)
by: Ackermann, Johannes, et al.
Published: (2025)
CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models
by: Tan, Zhehao, et al.
Published: (2026)
by: Tan, Zhehao, et al.
Published: (2026)
Tournament-GRPO: Group-Wise Tournament Rewards for Reinforcement Learning in Open-Ended Long-Form Generation
by: Yang, Zixuan, et al.
Published: (2026)
by: Yang, Zixuan, et al.
Published: (2026)
RLMR: Reinforcement Learning with Mixed Rewards for Creative Writing
by: Liao, Jianxing, et al.
Published: (2025)
by: Liao, Jianxing, et al.
Published: (2025)
Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback
by: Zhou, Jiayi, et al.
Published: (2024)
by: Zhou, Jiayi, et al.
Published: (2024)
Auditing Data Membership in Reinforcement Learning With Verifiable Rewards
by: Liu, Yule, et al.
Published: (2025)
by: Liu, Yule, et al.
Published: (2025)
WildReward: Learning Reward Models from In-the-Wild Human Interactions
by: Peng, Hao, et al.
Published: (2026)
by: Peng, Hao, et al.
Published: (2026)
Parameter Efficient Reinforcement Learning from Human Feedback
by: Sidahmed, Hakim, et al.
Published: (2024)
by: Sidahmed, Hakim, et al.
Published: (2024)
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models
by: Wang, Jiongxiao, et al.
Published: (2023)
by: Wang, Jiongxiao, et al.
Published: (2023)
Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning
by: Zhang, Kongcheng, et al.
Published: (2025)
by: Zhang, Kongcheng, et al.
Published: (2025)
Rewarding Creativity: A Human-Aligned Generative Reward Model for Reinforcement Learning in Storytelling
by: Li, Zhaoyan, et al.
Published: (2026)
by: Li, Zhaoyan, et al.
Published: (2026)
Distributionally Robust Reinforcement Learning with Human Feedback
by: Mandal, Debmalya, et al.
Published: (2025)
by: Mandal, Debmalya, et al.
Published: (2025)
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
by: Wen, Xumeng, et al.
Published: (2025)
by: Wen, Xumeng, et al.
Published: (2025)
Optimizing RAG Rerankers with LLM Feedback via Reinforcement Learning
by: Wu, Yuhang, et al.
Published: (2026)
by: Wu, Yuhang, et al.
Published: (2026)
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
by: Yang, Wenkai, et al.
Published: (2025)
by: Yang, Wenkai, et al.
Published: (2025)
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
by: Luo, Renjie, et al.
Published: (2025)
by: Luo, Renjie, et al.
Published: (2025)
SSR-Zero: Simple Self-Rewarding Reinforcement Learning for Machine Translation
by: Yang, Wenjie, et al.
Published: (2025)
by: Yang, Wenjie, et al.
Published: (2025)
On the Cause of Unfairness: A Training Sample Perspective
by: Yao, Yuanshun, et al.
Published: (2023)
by: Yao, Yuanshun, et al.
Published: (2023)
RED: Unleashing Token-Level Rewards from Holistic Feedback via Reward Redistribution
by: Li, Jiahui, et al.
Published: (2024)
by: Li, Jiahui, et al.
Published: (2024)
RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
by: Lee, Harrison, et al.
Published: (2023)
by: Lee, Harrison, et al.
Published: (2023)
RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards
by: Wang, Zhilin, et al.
Published: (2025)
by: Wang, Zhilin, et al.
Published: (2025)
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy
by: Liu, Chris Yuhao, et al.
Published: (2025)
by: Liu, Chris Yuhao, et al.
Published: (2025)
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning
by: Huang, Lei, et al.
Published: (2026)
by: Huang, Lei, et al.
Published: (2026)
Rethinking Diverse Human Preference Learning through Principal Component Analysis
by: Luo, Feng, et al.
Published: (2025)
by: Luo, Feng, et al.
Published: (2025)
Reinforcement Learning with Token-level Feedback for Controllable Text Generation
by: Li, Wendi, et al.
Published: (2024)
by: Li, Wendi, et al.
Published: (2024)
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
by: Wang, Peisong, et al.
Published: (2025)
by: Wang, Peisong, et al.
Published: (2025)
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
by: Yang, Rui, et al.
Published: (2024)
by: Yang, Rui, et al.
Published: (2024)
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
by: Hong, Haitao, et al.
Published: (2025)
by: Hong, Haitao, et al.
Published: (2025)
PLHF: Prompt Optimization with Few-Shot Human Feedback
by: Yang, Chun-Pai, et al.
Published: (2025)
by: Yang, Chun-Pai, et al.
Published: (2025)
Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback
by: Zhang, Xiaoying, et al.
Published: (2025)
by: Zhang, Xiaoying, et al.
Published: (2025)
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
by: Yin, Yueqin, et al.
Published: (2025)
by: Yin, Yueqin, et al.
Published: (2025)
Similar Items
-
Human-Instruction-Free LLM Self-Alignment with Limited Samples
by: Guo, Hongyi, et al.
Published: (2024) -
Toward Optimal LLM Alignments Using Two-Player Games
by: Zheng, Rui, et al.
Published: (2024) -
Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble
by: Zhang, Shun, et al.
Published: (2024) -
Measuring and Reducing LLM Hallucination without Gold-Standard Answers
by: Wei, Jiaheng, et al.
Published: (2024) -
Learning Personalized Agents from Human Feedback
by: Liang, Kaiqu, et al.
Published: (2026)