Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.15011 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866908372931969024 |
|---|---|
| author | Varys, Kryspin Cerutti, Federico Sobey, Adam Norman, Timothy J. |
| author_facet | Varys, Kryspin Cerutti, Federico Sobey, Adam Norman, Timothy J. |
| contents | Our society is governed by a set of norms which together bring about the values we cherish such as safety, fairness or trustworthiness. The goal of value-alignment is to create agents that not only do their tasks but through their behaviours also promote these values. Many of the norms are written as laws or rules (legal / safety norms) but even more remain unwritten (social norms). Furthermore, the techniques used to represent these norms also differ. Safety / legal norms are often represented explicitly, for example, in some logical language while social norms are typically learned and remain hidden in the parameter space of a neural network. There is a lack of approaches in the literature that could combine these various norm representations into a single algorithm. We propose a novel method that integrates these norms into the reinforcement learning process. Our method monitors the agent's compliance with the given norms and summarizes it in a quantity we call the agent's reputation. This quantity is used to weigh the received rewards to motivate the agent to become value-aligned. We carry out a series of experiments including a continuous state space traffic problem to demonstrate the importance of the written and unwritten norms and show how our method can find the value-aligned policies. Furthermore, we carry out ablations to demonstrate why it is better to combine these two groups of norms rather than using either separately. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2505_15011 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | HAVA: Hybrid Approach to Value-Alignment through Reward Weighing for Reinforcement Learning Varys, Kryspin Cerutti, Federico Sobey, Adam Norman, Timothy J. Artificial Intelligence Our society is governed by a set of norms which together bring about the values we cherish such as safety, fairness or trustworthiness. The goal of value-alignment is to create agents that not only do their tasks but through their behaviours also promote these values. Many of the norms are written as laws or rules (legal / safety norms) but even more remain unwritten (social norms). Furthermore, the techniques used to represent these norms also differ. Safety / legal norms are often represented explicitly, for example, in some logical language while social norms are typically learned and remain hidden in the parameter space of a neural network. There is a lack of approaches in the literature that could combine these various norm representations into a single algorithm. We propose a novel method that integrates these norms into the reinforcement learning process. Our method monitors the agent's compliance with the given norms and summarizes it in a quantity we call the agent's reputation. This quantity is used to weigh the received rewards to motivate the agent to become value-aligned. We carry out a series of experiments including a continuous state space traffic problem to demonstrate the importance of the written and unwritten norms and show how our method can find the value-aligned policies. Furthermore, we carry out ablations to demonstrate why it is better to combine these two groups of norms rather than using either separately. |
| title | HAVA: Hybrid Approach to Value-Alignment through Reward Weighing for Reinforcement Learning |
| topic | Artificial Intelligence |
| url | https://arxiv.org/abs/2505.15011 |