Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.05234 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866916463840854016 |
|---|---|
| author | Lau, Elaine Lu, Stephen Zhewen Pan, Ling Precup, Doina Bengio, Emmanuel |
| author_facet | Lau, Elaine Lu, Stephen Zhewen Pan, Ling Precup, Doina Bengio, Emmanuel |
| contents | Generative Flow Networks (GFlowNets; GFNs) are a family of energy-based generative methods for combinatorial objects, capable of generating diverse and high-utility samples. However, consistently biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value estimate, $Q$, to create greedier sampling policies which can be controlled by a mixing parameter. We show that several variants of the proposed method, QGFN, are able to improve on the number of high-reward samples generated in a variety of tasks without sacrificing diversity. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2402_05234 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | QGFN: Controllable Greediness with Action Values Lau, Elaine Lu, Stephen Zhewen Pan, Ling Precup, Doina Bengio, Emmanuel Machine Learning Generative Flow Networks (GFlowNets; GFNs) are a family of energy-based generative methods for combinatorial objects, capable of generating diverse and high-utility samples. However, consistently biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value estimate, $Q$, to create greedier sampling policies which can be controlled by a mixing parameter. We show that several variants of the proposed method, QGFN, are able to improve on the number of high-reward samples generated in a variety of tasks without sacrificing diversity. |
| title | QGFN: Controllable Greediness with Action Values |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2402.05234 |