Saved in:
Bibliographic Details
Main Authors: Ito, Takuya, Cocchi, Luca, Klinger, Tim, Ram, Parikshit, Campbell, Murray, Hearne, Luke
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.08272
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909657043304448
author Ito, Takuya
Cocchi, Luca
Klinger, Tim
Ram, Parikshit
Campbell, Murray
Hearne, Luke
author_facet Ito, Takuya
Cocchi, Luca
Klinger, Tim
Ram, Parikshit
Campbell, Murray
Hearne, Luke
contents In transformers, the positional encoding (PE) provides essential information that distinguishes the position and order amongst tokens in a sequence. Most prior investigations of PE effects on generalization were tailored to 1D input sequences, such as those presented in natural language, where adjacent tokens (e.g., words) are highly related. In contrast, many real world tasks involve datasets with highly non-trivial positional arrangements, such as datasets organized in multiple spatial dimensions, or datasets for which ground truth positions are not known. Here we find that the choice of initialization of a learnable PE greatly influences its ability to learn interpretable PEs that lead to enhanced generalization. We empirically demonstrate our findings in three experiments: 1) A 2D relational reasoning task; 2) A nonlinear stochastic network simulation; 3) A real world 3D neuroscience dataset, applying interpretability analyses to verify the learning of accurate PEs. Overall, we find that a learned PE initialized from a small-norm distribution can 1) uncover interpretable PEs that mirror ground truth positions in multiple dimensions, and 2) lead to improved generalization. These results illustrate the feasibility of learning identifiable and interpretable PEs for enhanced generalization.
format Preprint
id arxiv_https___arxiv_org_abs_2406_08272
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Learning interpretable positional encodings in transformers depends on initialization
Ito, Takuya
Cocchi, Luca
Klinger, Tim
Ram, Parikshit
Campbell, Murray
Hearne, Luke
Machine Learning
In transformers, the positional encoding (PE) provides essential information that distinguishes the position and order amongst tokens in a sequence. Most prior investigations of PE effects on generalization were tailored to 1D input sequences, such as those presented in natural language, where adjacent tokens (e.g., words) are highly related. In contrast, many real world tasks involve datasets with highly non-trivial positional arrangements, such as datasets organized in multiple spatial dimensions, or datasets for which ground truth positions are not known. Here we find that the choice of initialization of a learnable PE greatly influences its ability to learn interpretable PEs that lead to enhanced generalization. We empirically demonstrate our findings in three experiments: 1) A 2D relational reasoning task; 2) A nonlinear stochastic network simulation; 3) A real world 3D neuroscience dataset, applying interpretability analyses to verify the learning of accurate PEs. Overall, we find that a learned PE initialized from a small-norm distribution can 1) uncover interpretable PEs that mirror ground truth positions in multiple dimensions, and 2) lead to improved generalization. These results illustrate the feasibility of learning identifiable and interpretable PEs for enhanced generalization.
title Learning interpretable positional encodings in transformers depends on initialization
topic Machine Learning
url https://arxiv.org/abs/2406.08272