Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Bonino, Matteo, Ghione, Giorgia, Cirrincione, Giansalvo
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2502.12033
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916618228989952
author	Bonino, Matteo Ghione, Giorgia Cirrincione, Giansalvo
author_facet	Bonino, Matteo Ghione, Giorgia Cirrincione, Giansalvo
contents	Transformer neural networks, particularly Bidirectional Encoder Representations from Transformers (BERT), have shown remarkable performance across various tasks such as classification, text summarization, and question answering. However, their internal mechanisms remain mathematically obscure, highlighting the need for greater explainability and interpretability. In this direction, this paper investigates the internal mechanisms of BERT proposing a novel perspective on the attention mechanism of BERT from a theoretical perspective. The analysis encompasses both local and global network behavior. At the local level, the concept of directionality of subspace selection as well as a comprehensive study of the patterns emerging from the self-attention matrix are presented. Additionally, this work explores the semantic content of the information stream through data distribution analysis and global statistical measures including the novel concept of cone index. A case study on the classification of SARS-CoV-2 variants using RNA which resulted in a very high accuracy has been selected in order to observe these concepts in an application. The insights gained from this analysis contribute to a deeper understanding of BERT's classification process, offering potential avenues for future architectural improvements in Transformer models and further analysis in the training process.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_12033
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	The geometry of BERT Bonino, Matteo Ghione, Giorgia Cirrincione, Giansalvo Machine Learning Transformer neural networks, particularly Bidirectional Encoder Representations from Transformers (BERT), have shown remarkable performance across various tasks such as classification, text summarization, and question answering. However, their internal mechanisms remain mathematically obscure, highlighting the need for greater explainability and interpretability. In this direction, this paper investigates the internal mechanisms of BERT proposing a novel perspective on the attention mechanism of BERT from a theoretical perspective. The analysis encompasses both local and global network behavior. At the local level, the concept of directionality of subspace selection as well as a comprehensive study of the patterns emerging from the self-attention matrix are presented. Additionally, this work explores the semantic content of the information stream through data distribution analysis and global statistical measures including the novel concept of cone index. A case study on the classification of SARS-CoV-2 variants using RNA which resulted in a very high accuracy has been selected in order to observe these concepts in an application. The insights gained from this analysis contribute to a deeper understanding of BERT's classification process, offering potential avenues for future architectural improvements in Transformer models and further analysis in the training process.
title	The geometry of BERT
topic	Machine Learning
url	https://arxiv.org/abs/2502.12033

Similar Items