Saved in:
Bibliographic Details
Main Author: Racioppo, Peter
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.04154
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918518911401984
author Racioppo, Peter
author_facet Racioppo, Peter
contents We introduce Robust Filter Attention (RFA), a formulation of self-attention as a robust state estimator. Each token is treated as a noisy observation of a latent trajectory governed by a linear stochastic differential equation (SDE), and attention weights are determined by consistency under this model rather than static feature similarity. Under isotropic noise and decay assumptions, RFA matches the computational complexity of standard attention. On language modeling benchmarks, RFA achieves lower perplexity than RoPE within the training window while remaining stable under zero-shot extrapolation to longer contexts. The framework also provides a dynamical interpretation of standard positional mechanisms, connecting rotational embeddings and recency biases to transport and uncertainty propagation induced by stochastic dynamics.
format Preprint
id arxiv_https___arxiv_org_abs_2509_04154
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Robust Filter Attention: Self-Attention as Precision-Weighted State Estimation
Racioppo, Peter
Machine Learning
Artificial Intelligence
We introduce Robust Filter Attention (RFA), a formulation of self-attention as a robust state estimator. Each token is treated as a noisy observation of a latent trajectory governed by a linear stochastic differential equation (SDE), and attention weights are determined by consistency under this model rather than static feature similarity. Under isotropic noise and decay assumptions, RFA matches the computational complexity of standard attention. On language modeling benchmarks, RFA achieves lower perplexity than RoPE within the training window while remaining stable under zero-shot extrapolation to longer contexts. The framework also provides a dynamical interpretation of standard positional mechanisms, connecting rotational embeddings and recency biases to transport and uncertainty propagation induced by stochastic dynamics.
title Robust Filter Attention: Self-Attention as Precision-Weighted State Estimation
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2509.04154