Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.26672 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866917050557923328 |
|---|---|
| author | He, Ruimin Lin, Shaowei |
| author_facet | He, Ruimin Lin, Shaowei |
| contents | At the heart of reinforcement learning are actions -- decisions made in response to observations of the environment. Actions are equally fundamental in the modeling of stochastic processes, as they trigger discontinuous state transitions and enable the flow of information through large, complex systems. In this paper, we unify the perspectives of stochastic processes and reinforcement learning through action-driven processes, and illustrate their application to spiking neural networks. Leveraging ideas from control-as-inference, we show that minimizing the Kullback-Leibler divergence between a policy-driven true distribution and a reward-driven model distribution for a suitably defined action-driven process is equivalent to maximum entropy reinforcement learning. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2510_26672 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Action-Driven Processes for Continuous-Time Control He, Ruimin Lin, Shaowei Machine Learning At the heart of reinforcement learning are actions -- decisions made in response to observations of the environment. Actions are equally fundamental in the modeling of stochastic processes, as they trigger discontinuous state transitions and enable the flow of information through large, complex systems. In this paper, we unify the perspectives of stochastic processes and reinforcement learning through action-driven processes, and illustrate their application to spiking neural networks. Leveraging ideas from control-as-inference, we show that minimizing the Kullback-Leibler divergence between a policy-driven true distribution and a reward-driven model distribution for a suitably defined action-driven process is equivalent to maximum entropy reinforcement learning. |
| title | Action-Driven Processes for Continuous-Time Control |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2510.26672 |