Saved in:
Bibliographic Details
Main Authors: Usatenko, O. V., Melnyk, S. S., Pritula, G. M.
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.04412
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908866672852992
author Usatenko, O. V.
Melnyk, S. S.
Pritula, G. M.
author_facet Usatenko, O. V.
Melnyk, S. S.
Pritula, G. M.
contents Large-scale language models (LLMs) operate in extremely high-dimensional state spaces, where both token embeddings and their hidden representations create complex dependencies that are not easily reduced to classical Markov structures. In this paper, we explore a theoretically feasible approximation of LLM dynamics using N-order additive Markov chains. Such models allow the conditional probability of the next token to be decomposed into a superposition of contributions from multiple historical depths, reducing the combinatorial explosion typically associated with high-order Markov processes. The main result of the work is the establishment of a correspondence between an additive multi-step chain and a chain with a step-wise memory function. This equivalence allowed the introduction of the concept of information temperature not only for stepwise but also for additive N-order Markov chains.
format Preprint
id arxiv_https___arxiv_org_abs_2603_04412
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Additive Multi-Step Markov Chains and the Curse of Dimensionality in Large Language Models
Usatenko, O. V.
Melnyk, S. S.
Pritula, G. M.
Computation and Language
Large-scale language models (LLMs) operate in extremely high-dimensional state spaces, where both token embeddings and their hidden representations create complex dependencies that are not easily reduced to classical Markov structures. In this paper, we explore a theoretically feasible approximation of LLM dynamics using N-order additive Markov chains. Such models allow the conditional probability of the next token to be decomposed into a superposition of contributions from multiple historical depths, reducing the combinatorial explosion typically associated with high-order Markov processes. The main result of the work is the establishment of a correspondence between an additive multi-step chain and a chain with a step-wise memory function. This equivalence allowed the introduction of the concept of information temperature not only for stepwise but also for additive N-order Markov chains.
title Additive Multi-Step Markov Chains and the Curse of Dimensionality in Large Language Models
topic Computation and Language
url https://arxiv.org/abs/2603.04412