Saved in:
| Main Authors: | Metz, Yannick, Geiszl, András, Baur, Raphaël, El-Assady, Mennatallah |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.21038 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MAVRL: Learning Reward Functions from Multiple Feedback Types with Amortized Variational Inference
by: Baur, Raphaël, et al.
Published: (2026)
by: Baur, Raphaël, et al.
Published: (2026)
Mapping out the Space of Human Feedback for Reinforcement Learning: A Conceptual Framework
by: Metz, Yannick, et al.
Published: (2024)
by: Metz, Yannick, et al.
Published: (2024)
TopoAlign: Topology-Aware Visual Representation Alignment
by: Yan, Xinyuan, et al.
Published: (2026)
by: Yan, Xinyuan, et al.
Published: (2026)
Deconstructing Human-AI Collaboration: Agency, Interaction, and Adaptation
by: Holter, Steffen, et al.
Published: (2024)
by: Holter, Steffen, et al.
Published: (2024)
Concept-Level Explainability for Auditing & Steering LLM Responses
by: Amara, Kenza, et al.
Published: (2025)
by: Amara, Kenza, et al.
Published: (2025)
SyntaxShap: Syntax-aware Explainability Method for Text Generation
by: Amara, Kenza, et al.
Published: (2024)
by: Amara, Kenza, et al.
Published: (2024)
Challenges and Opportunities in Text Generation Explainability
by: Amara, Kenza, et al.
Published: (2024)
by: Amara, Kenza, et al.
Published: (2024)
DxHF: Providing High-Quality Human Feedback for LLM Alignment via Interactive Decomposition
by: Shi, Danqing, et al.
Published: (2025)
by: Shi, Danqing, et al.
Published: (2025)
iNNspector: Visual, Interactive Deep Model Debugging
by: Spinner, Thilo, et al.
Published: (2024)
by: Spinner, Thilo, et al.
Published: (2024)
Understanding Large Language Model Behaviors through Interactive Counterfactual Generation and Analysis
by: Cheng, Furui, et al.
Published: (2024)
by: Cheng, Furui, et al.
Published: (2024)
Which Rewards Matter? Reward Selection for Reinforcement Learning under Limited Feedback
by: Chaudhari, Shreyas, et al.
Published: (2025)
by: Chaudhari, Shreyas, et al.
Published: (2025)
Cross-Cultural Simulation of Citizen Emotional Responses to Bureaucratic Red Tape Using LLM Agents
by: Ni, Wanchun, et al.
Published: (2026)
by: Ni, Wanchun, et al.
Published: (2026)
Adaptive Querying for Reward Learning from Human Feedback
by: Anand, Yashwanthi, et al.
Published: (2024)
by: Anand, Yashwanthi, et al.
Published: (2024)
Repairing Reward Functions with Feedback to Mitigate Reward Hacking
by: Hatgis-Kessell, Stephane, et al.
Published: (2025)
by: Hatgis-Kessell, Stephane, et al.
Published: (2025)
Causally Robust Reward Learning from Reason-Augmented Preference Feedback
by: Hwang, Minjune, et al.
Published: (2026)
by: Hwang, Minjune, et al.
Published: (2026)
PleaSQLarify: Visual Pragmatic Repair for Natural Language Database Querying
by: Chan, Robin Shing Moon, et al.
Published: (2026)
by: Chan, Robin Shing Moon, et al.
Published: (2026)
Can Differentiable Decision Trees Enable Interpretable Reward Learning from Human Feedback?
by: Kalra, Akansha, et al.
Published: (2023)
by: Kalra, Akansha, et al.
Published: (2023)
Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
by: Ackermann, Johannes, et al.
Published: (2026)
by: Ackermann, Johannes, et al.
Published: (2026)
Fusing Reward and Dueling Feedback in Stochastic Bandits
by: Wang, Xuchuang, et al.
Published: (2025)
by: Wang, Xuchuang, et al.
Published: (2025)
A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback
by: Kim, Kihyun, et al.
Published: (2024)
by: Kim, Kihyun, et al.
Published: (2024)
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
by: Luo, Renjie, et al.
Published: (2025)
by: Luo, Renjie, et al.
Published: (2025)
Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback
by: Ackermann, Johannes, et al.
Published: (2025)
by: Ackermann, Johannes, et al.
Published: (2025)
Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble
by: Zhang, Shun, et al.
Published: (2024)
by: Zhang, Shun, et al.
Published: (2024)
Online Learning with Multiple Fairness Regularizers via Graph-Structured Feedback
by: Zhou, Quan, et al.
Published: (2025)
by: Zhou, Quan, et al.
Published: (2025)
Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
by: Liu, Shang, et al.
Published: (2024)
by: Liu, Shang, et al.
Published: (2024)
Phase-Type Variational Autoencoders for Heavy-Tailed Data
by: Ziani, Abdelhakim, et al.
Published: (2026)
by: Ziani, Abdelhakim, et al.
Published: (2026)
Noise-corrected GRPO: From Noisy Rewards to Unbiased Gradients
by: Mansouri, Omar El, et al.
Published: (2025)
by: Mansouri, Omar El, et al.
Published: (2025)
RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards
by: Wang, Zhilin, et al.
Published: (2025)
by: Wang, Zhilin, et al.
Published: (2025)
Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback
by: Afsharrad, Amirhossein, et al.
Published: (2026)
by: Afsharrad, Amirhossein, et al.
Published: (2026)
Laser Learning Environment: A new environment for coordination-critical multi-agent tasks
by: Molinghen, Yannick, et al.
Published: (2024)
by: Molinghen, Yannick, et al.
Published: (2024)
Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities
by: Amara, Kenza, et al.
Published: (2024)
by: Amara, Kenza, et al.
Published: (2024)
PowerGraph: A power grid benchmark dataset for graph neural networks
by: Varbella, Anna, et al.
Published: (2024)
by: Varbella, Anna, et al.
Published: (2024)
Zero-Shot LLMs in Human-in-the-Loop RL: Replacing Human Feedback for Reward Shaping
by: Nazir, Mohammad Saif, et al.
Published: (2025)
by: Nazir, Mohammad Saif, et al.
Published: (2025)
Using Large Language Models to Automate and Expedite Reinforcement Learning with Reward Machine
by: Alsadat, Shayan Meshkat, et al.
Published: (2024)
by: Alsadat, Shayan Meshkat, et al.
Published: (2024)
CRoSS: A Continual Robotic Simulation Suite for Scalable Reinforcement Learning with High Task Diversity and Realistic Physics Simulation
by: Denker, Yannick, et al.
Published: (2026)
by: Denker, Yannick, et al.
Published: (2026)
Feedback Loops With Language Models Drive In-Context Reward Hacking
by: Pan, Alexander, et al.
Published: (2024)
by: Pan, Alexander, et al.
Published: (2024)
Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence
by: György, András, et al.
Published: (2025)
by: György, András, et al.
Published: (2025)
SemiReward: A General Reward Model for Semi-supervised Learning
by: Li, Siyuan, et al.
Published: (2023)
by: Li, Siyuan, et al.
Published: (2023)
Tiered Reward: Designing Rewards for Specification and Fast Learning of Desired Behavior
by: Zhou, Zhiyuan, et al.
Published: (2022)
by: Zhou, Zhiyuan, et al.
Published: (2022)
RLSR: Reinforcement Learning from Self Reward
by: Simonds, Toby, et al.
Published: (2025)
by: Simonds, Toby, et al.
Published: (2025)
Similar Items
-
MAVRL: Learning Reward Functions from Multiple Feedback Types with Amortized Variational Inference
by: Baur, Raphaël, et al.
Published: (2026) -
Mapping out the Space of Human Feedback for Reinforcement Learning: A Conceptual Framework
by: Metz, Yannick, et al.
Published: (2024) -
TopoAlign: Topology-Aware Visual Representation Alignment
by: Yan, Xinyuan, et al.
Published: (2026) -
Deconstructing Human-AI Collaboration: Agency, Interaction, and Adaptation
by: Holter, Steffen, et al.
Published: (2024) -
Concept-Level Explainability for Auditing & Steering LLM Responses
by: Amara, Kenza, et al.
Published: (2025)