Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ito, Takuya, Cocchi, Luca, Klinger, Tim, Ram, Parikshit, Campbell, Murray, Hearne, Luke
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2406.08272
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909657043304448
author	Ito, Takuya Cocchi, Luca Klinger, Tim Ram, Parikshit Campbell, Murray Hearne, Luke
author_facet	Ito, Takuya Cocchi, Luca Klinger, Tim Ram, Parikshit Campbell, Murray Hearne, Luke
contents	In transformers, the positional encoding (PE) provides essential information that distinguishes the position and order amongst tokens in a sequence. Most prior investigations of PE effects on generalization were tailored to 1D input sequences, such as those presented in natural language, where adjacent tokens (e.g., words) are highly related. In contrast, many real world tasks involve datasets with highly non-trivial positional arrangements, such as datasets organized in multiple spatial dimensions, or datasets for which ground truth positions are not known. Here we find that the choice of initialization of a learnable PE greatly influences its ability to learn interpretable PEs that lead to enhanced generalization. We empirically demonstrate our findings in three experiments: 1) A 2D relational reasoning task; 2) A nonlinear stochastic network simulation; 3) A real world 3D neuroscience dataset, applying interpretability analyses to verify the learning of accurate PEs. Overall, we find that a learned PE initialized from a small-norm distribution can 1) uncover interpretable PEs that mirror ground truth positions in multiple dimensions, and 2) lead to improved generalization. These results illustrate the feasibility of learning identifiable and interpretable PEs for enhanced generalization.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_08272
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Learning interpretable positional encodings in transformers depends on initialization Ito, Takuya Cocchi, Luca Klinger, Tim Ram, Parikshit Campbell, Murray Hearne, Luke Machine Learning In transformers, the positional encoding (PE) provides essential information that distinguishes the position and order amongst tokens in a sequence. Most prior investigations of PE effects on generalization were tailored to 1D input sequences, such as those presented in natural language, where adjacent tokens (e.g., words) are highly related. In contrast, many real world tasks involve datasets with highly non-trivial positional arrangements, such as datasets organized in multiple spatial dimensions, or datasets for which ground truth positions are not known. Here we find that the choice of initialization of a learnable PE greatly influences its ability to learn interpretable PEs that lead to enhanced generalization. We empirically demonstrate our findings in three experiments: 1) A 2D relational reasoning task; 2) A nonlinear stochastic network simulation; 3) A real world 3D neuroscience dataset, applying interpretability analyses to verify the learning of accurate PEs. Overall, we find that a learned PE initialized from a small-norm distribution can 1) uncover interpretable PEs that mirror ground truth positions in multiple dimensions, and 2) lead to improved generalization. These results illustrate the feasibility of learning identifiable and interpretable PEs for enhanced generalization.
title	Learning interpretable positional encodings in transformers depends on initialization
topic	Machine Learning
url	https://arxiv.org/abs/2406.08272

Similar Items