Saved in:
| Main Authors: | Hayakawa, Daichi, Sato, Issei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.12413 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Length Generalization of Causal Transformers without Position Encoding
by: Wang, Jie, et al.
Published: (2024)
by: Wang, Jie, et al.
Published: (2024)
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
by: Xu, Kevin, et al.
Published: (2024)
by: Xu, Kevin, et al.
Published: (2024)
Rethinking Associative Memory Mechanism in Induction Head
by: Wang, Shuo, et al.
Published: (2024)
by: Wang, Shuo, et al.
Published: (2024)
Max-pooling Network Revisited: Analyzing the Role of Semantic Probability in Multiple Instance Learning for Hallucination Detection
by: Fujikawa, Shota, et al.
Published: (2026)
by: Fujikawa, Shota, et al.
Published: (2026)
A Formal Comparison Between Chain of Thought and Latent Thought
by: Xu, Kevin, et al.
Published: (2025)
by: Xu, Kevin, et al.
Published: (2025)
On the Geometry of Positional Encodings in Transformers
by: Cirrincione, Giansalvo
Published: (2026)
by: Cirrincione, Giansalvo
Published: (2026)
ExPe: Exact Positional Encodings for Generative Transformer Models with Extrapolating Capabilities
by: Datseris, Aleksis, et al.
Published: (2025)
by: Datseris, Aleksis, et al.
Published: (2025)
Fusion Matters: Length-Aware Analysis of Positional-Encoding Fusion in Transformers
by: Hallam, Mohamed Amine, et al.
Published: (2026)
by: Hallam, Mohamed Amine, et al.
Published: (2026)
Theoretical Analysis of Byte-Pair Encoding
by: Kozma, László, et al.
Published: (2024)
by: Kozma, László, et al.
Published: (2024)
SeqPE: Transformer with Sequential Position Encoding
by: Li, Huayang, et al.
Published: (2025)
by: Li, Huayang, et al.
Published: (2025)
Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding
by: Zhao, Liang, et al.
Published: (2023)
by: Zhao, Liang, et al.
Published: (2023)
Position Information Emerges in Causal Transformers Without Positional Encodings via Similarity of Nearby Embeddings
by: Zuo, Chunsheng, et al.
Published: (2024)
by: Zuo, Chunsheng, et al.
Published: (2024)
PaTH Attention: Position Encoding via Accumulating Householder Transformations
by: Yang, Songlin, et al.
Published: (2025)
by: Yang, Songlin, et al.
Published: (2025)
Why Are Positional Encodings Nonessential for Deep Autoregressive Transformers? Revisiting a Petroglyph
by: Irie, Kazuki
Published: (2024)
by: Irie, Kazuki
Published: (2024)
Beyond Sinusoids: A Morlet Wavelet Framework for Transformer Positional Encoding
by: Zeris, Athanasios
Published: (2026)
by: Zeris, Athanasios
Published: (2026)
Understanding Transformer Optimization via Gradient Heterogeneity
by: Tomihari, Akiyoshi, et al.
Published: (2025)
by: Tomihari, Akiyoshi, et al.
Published: (2025)
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
by: Huang, Ruiquan, et al.
Published: (2025)
by: Huang, Ruiquan, et al.
Published: (2025)
Speech Swin-Transformer: Exploring a Hierarchical Transformer with Shifted Windows for Speech Emotion Recognition
by: Wang, Yong, et al.
Published: (2024)
by: Wang, Yong, et al.
Published: (2024)
Hierarchical Bracketing Encodings Work for Dependency Graphs
by: Ezquerro, Ana, et al.
Published: (2025)
by: Ezquerro, Ana, et al.
Published: (2025)
Hierarchical Bracketing Encodings for Dependency Parsing as Tagging
by: Ezquerro, Ana, et al.
Published: (2025)
by: Ezquerro, Ana, et al.
Published: (2025)
A Morphology-Based Investigation of Positional Encodings
by: Ghosh, Poulami, et al.
Published: (2024)
by: Ghosh, Poulami, et al.
Published: (2024)
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
by: Li, Hongkang, et al.
Published: (2024)
by: Li, Hongkang, et al.
Published: (2024)
Group Representational Position Encoding
by: Zhang, Yifan, et al.
Published: (2025)
by: Zhang, Yifan, et al.
Published: (2025)
DAPE: Data-Adaptive Positional Encoding for Length Extrapolation
by: Zheng, Chuanyang, et al.
Published: (2024)
by: Zheng, Chuanyang, et al.
Published: (2024)
Energy-Gated Attention and Wavelet Positional Encoding: Complementary Inductive Biases for Transformer Attention
by: Zeris, Athanasios
Published: (2026)
by: Zeris, Athanasios
Published: (2026)
2D-TPE: Two-Dimensional Positional Encoding Enhances Table Understanding for Large Language Models
by: Li, Jia-Nan, et al.
Published: (2024)
by: Li, Jia-Nan, et al.
Published: (2024)
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models
by: Bai, Yuyang, et al.
Published: (2023)
by: Bai, Yuyang, et al.
Published: (2023)
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
by: Zhu, Yongxin, et al.
Published: (2024)
by: Zhu, Yongxin, et al.
Published: (2024)
On the Encoding of Gender in Transformer-based ASR Representations
by: Krishnan, Aravind, et al.
Published: (2024)
by: Krishnan, Aravind, et al.
Published: (2024)
Encoding Hierarchical Schema via Concept Flow for Multifaceted Ideology Detection
by: Liu, Songtao, et al.
Published: (2024)
by: Liu, Songtao, et al.
Published: (2024)
Contextual Position Encoding: Learning to Count What's Important
by: Golovneva, Olga, et al.
Published: (2024)
by: Golovneva, Olga, et al.
Published: (2024)
Positional Encoding via Token-Aware Phase Attention
by: Wang, Yu, et al.
Published: (2025)
by: Wang, Yu, et al.
Published: (2025)
On the Interplay between Positional Encodings, Morphological Complexity, and Word Order Flexibility
by: Tatariya, Kushal, et al.
Published: (2025)
by: Tatariya, Kushal, et al.
Published: (2025)
Layer-Specific Scaling of Positional Encodings for Superior Long-Context Modeling
by: Wang, Zhenghua, et al.
Published: (2025)
by: Wang, Zhenghua, et al.
Published: (2025)
Theoretical Analysis of Weak-to-Strong Generalization
by: Lang, Hunter, et al.
Published: (2024)
by: Lang, Hunter, et al.
Published: (2024)
CMATH: Cross-Modality Augmented Transformer with Hierarchical Variational Distillation for Multimodal Emotion Recognition in Conversation
by: Zhu, Xiaofei, et al.
Published: (2024)
by: Zhu, Xiaofei, et al.
Published: (2024)
Do Language Models Encode Semantic Relations? Probing and Sparse Feature Analysis
by: Diera, Andor, et al.
Published: (2026)
by: Diera, Andor, et al.
Published: (2026)
Text-Based Correlation Matrix in Multi-Asset Allocation
by: Nakayama, Yasuhiro, et al.
Published: (2024)
by: Nakayama, Yasuhiro, et al.
Published: (2024)
PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large Language Models
by: Aggarwal, Arpit
Published: (2024)
by: Aggarwal, Arpit
Published: (2024)
Advancing Neural Encoding of Portuguese with Transformer Albertina PT-*
by: Rodrigues, João, et al.
Published: (2023)
by: Rodrigues, João, et al.
Published: (2023)
Similar Items
-
Length Generalization of Causal Transformers without Position Encoding
by: Wang, Jie, et al.
Published: (2024) -
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
by: Xu, Kevin, et al.
Published: (2024) -
Rethinking Associative Memory Mechanism in Induction Head
by: Wang, Shuo, et al.
Published: (2024) -
Max-pooling Network Revisited: Analyzing the Role of Semantic Probability in Multiple Instance Learning for Hallucination Detection
by: Fujikawa, Shota, et al.
Published: (2026) -
A Formal Comparison Between Chain of Thought and Latent Thought
by: Xu, Kevin, et al.
Published: (2025)