Saved in:
Bibliographic Details
Main Authors: He, Ruimin, Lin, Shaowei
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.26672
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917050557923328
author He, Ruimin
Lin, Shaowei
author_facet He, Ruimin
Lin, Shaowei
contents At the heart of reinforcement learning are actions -- decisions made in response to observations of the environment. Actions are equally fundamental in the modeling of stochastic processes, as they trigger discontinuous state transitions and enable the flow of information through large, complex systems. In this paper, we unify the perspectives of stochastic processes and reinforcement learning through action-driven processes, and illustrate their application to spiking neural networks. Leveraging ideas from control-as-inference, we show that minimizing the Kullback-Leibler divergence between a policy-driven true distribution and a reward-driven model distribution for a suitably defined action-driven process is equivalent to maximum entropy reinforcement learning.
format Preprint
id arxiv_https___arxiv_org_abs_2510_26672
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Action-Driven Processes for Continuous-Time Control
He, Ruimin
Lin, Shaowei
Machine Learning
At the heart of reinforcement learning are actions -- decisions made in response to observations of the environment. Actions are equally fundamental in the modeling of stochastic processes, as they trigger discontinuous state transitions and enable the flow of information through large, complex systems. In this paper, we unify the perspectives of stochastic processes and reinforcement learning through action-driven processes, and illustrate their application to spiking neural networks. Leveraging ideas from control-as-inference, we show that minimizing the Kullback-Leibler divergence between a policy-driven true distribution and a reward-driven model distribution for a suitably defined action-driven process is equivalent to maximum entropy reinforcement learning.
title Action-Driven Processes for Continuous-Time Control
topic Machine Learning
url https://arxiv.org/abs/2510.26672