Saved in:
| Main Authors: | , |
|---|---|
| Format: | Recurso digital |
| Language: | |
| Published: |
Zenodo
2025
|
| Online Access: | https://doi.org/10.5281/zenodo.17818930 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Table of Contents:
- Transformer models have revolutionized natural language processing and computer vision due to their ability to capture long-range dependencies through the attention mechanism. However, the quadratic computational complexity of the attention mechanism with respect to sequence length poses a significant bottleneck for processing long sequences. This paper introduces a novel approach to reduce the computational complexity of attention to sublinear by leveraging learnable orthogonal projections. Our method projects the query, key, and value matrices into a lower-dimensional subspace using orthogonal projection matrices that are learned during training. By enforcing orthogonality, we ensure that the information captured in the lower-dimensional space is maximally preserved, while the reduced dimensionality leads to a significant reduction in computational cost. We present a detailed theoretical analysis of the proposed method, demonstrating its ability to approximate the original attention mechanism with provable guarantees. Empirical results on various benchmark datasets demonstrate that our approach achieves significant speedups compared to the standard attention mechanism, while maintaining comparable or even improved accuracy. This work opens up new possibilities for scaling transformer models to handle extremely long sequences, paving the way for more efficient and powerful language models.