Guardado en:
| Autores principales: | Liu, Jingwen, Yu, Hantao, Sanford, Clayton, Andoni, Alexandr, Hsu, Daniel |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2509.09001 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Fixed Universal Transformers
por: Liu, Jingwen, et al.
Publicado: (2026)
por: Liu, Jingwen, et al.
Publicado: (2026)
Transformers, parallel computation, and logarithmic depth
por: Sanford, Clayton, et al.
Publicado: (2024)
por: Sanford, Clayton, et al.
Publicado: (2024)
One-layer transformers fail to solve the induction heads task
por: Sanford, Clayton, et al.
Publicado: (2024)
por: Sanford, Clayton, et al.
Publicado: (2024)
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
por: Mousavi-Hosseini, Alireza, et al.
Publicado: (2025)
por: Mousavi-Hosseini, Alireza, et al.
Publicado: (2025)
A multi-source data power load forecasting method using attention mechanism-based parallel cnn-gru
por: Min, Chao, et al.
Publicado: (2024)
por: Min, Chao, et al.
Publicado: (2024)
Group-wise oracle-efficient algorithms for online multi-group learning
por: Deng, Samuel, et al.
Publicado: (2024)
por: Deng, Samuel, et al.
Publicado: (2024)
Group-realizable multi-group learning by minimizing empirical risk
por: Ardeshir, Navid, et al.
Publicado: (2026)
por: Ardeshir, Navid, et al.
Publicado: (2026)
Lost in Tokenization: Fundamental Trade-offs in Graph Tokenization for Transformers
por: Bechler-Speicher, Maya, et al.
Publicado: (2026)
por: Bechler-Speicher, Maya, et al.
Publicado: (2026)
Easy attention: A simple attention mechanism for temporal predictions with transformers
por: Sanchis-Agudo, Marcial, et al.
Publicado: (2023)
por: Sanchis-Agudo, Marcial, et al.
Publicado: (2023)
Next-Token Prediction and Regret Minimization
por: Mohri, Mehryar, et al.
Publicado: (2026)
por: Mohri, Mehryar, et al.
Publicado: (2026)
Approximation of relation functions and attention mechanisms
por: Altabaa, Awni, et al.
Publicado: (2024)
por: Altabaa, Awni, et al.
Publicado: (2024)
Flow Straight and Fast in Hilbert Space: Functional Rectified Flow
por: Zhang, Jianxin, et al.
Publicado: (2025)
por: Zhang, Jianxin, et al.
Publicado: (2025)
Best of Both Worlds: Advantages of Hybrid Graph Sequence Models
por: Behrouz, Ali, et al.
Publicado: (2024)
por: Behrouz, Ali, et al.
Publicado: (2024)
Two Heads Are Better than One: Simulating Large Transformers with Small Ones
por: Yu, Hantao, et al.
Publicado: (2025)
por: Yu, Hantao, et al.
Publicado: (2025)
Reservoir observer enhanced with residual calibration and attention mechanism
por: Liu, Yichen, et al.
Publicado: (2026)
por: Liu, Yichen, et al.
Publicado: (2026)
FMamba: Mamba based on Fast-attention for Multivariate Time-series Forecasting
por: Ma, Shusen, et al.
Publicado: (2024)
por: Ma, Shusen, et al.
Publicado: (2024)
A foundation model with multi-variate parallel attention to generate neuronal activity
por: Carzaniga, Francesco, et al.
Publicado: (2025)
por: Carzaniga, Francesco, et al.
Publicado: (2025)
Relational inductive biases on attention mechanisms
por: Mijangos, Víctor, et al.
Publicado: (2025)
por: Mijangos, Víctor, et al.
Publicado: (2025)
A First Guess is Rarely the Final Answer: Learning to Search in the Traveling Salesperson Problem
por: Garmendia, Andoni Irazusta
Publicado: (2026)
por: Garmendia, Andoni Irazusta
Publicado: (2026)
Fundamental Limitations on Subquadratic Alternatives to Transformers
por: Alman, Josh, et al.
Publicado: (2024)
por: Alman, Josh, et al.
Publicado: (2024)
Statistical-Computational Trade-offs for Density Estimation
por: Aamand, Anders, et al.
Publicado: (2024)
por: Aamand, Anders, et al.
Publicado: (2024)
Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers
por: Yehudai, Gilad, et al.
Publicado: (2025)
por: Yehudai, Gilad, et al.
Publicado: (2025)
Parent-Guided Semantic Reward Model (PGSRM): Embedding-Based Reward Functions for Reinforcement Learning of Transformer Language Models
por: Plashchinsky, Alexandr
Publicado: (2025)
por: Plashchinsky, Alexandr
Publicado: (2025)
Fast parallel sampling under isoperimetry
por: Anari, Nima, et al.
Publicado: (2024)
por: Anari, Nima, et al.
Publicado: (2024)
Understanding Transformer Reasoning Capabilities via Graph Algorithms
por: Sanford, Clayton, et al.
Publicado: (2024)
por: Sanford, Clayton, et al.
Publicado: (2024)
Tucker Attention: A generalization of approximate attention mechanisms
por: Klein, Timon, et al.
Publicado: (2026)
por: Klein, Timon, et al.
Publicado: (2026)
Redundant feature screening method for human activity recognition based on attention purification mechanism
por: Li, Xiaoyang, et al.
Publicado: (2025)
por: Li, Xiaoyang, et al.
Publicado: (2025)
Implicit Bias and Fast Convergence Rates for Self-attention
por: Vasudeva, Bhavya, et al.
Publicado: (2024)
por: Vasudeva, Bhavya, et al.
Publicado: (2024)
Hierarchical Motion Captioning Utilizing External Text Data Source
por: Leite, Clayton, et al.
Publicado: (2025)
por: Leite, Clayton, et al.
Publicado: (2025)
Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing
por: Leite, Clayton, et al.
Publicado: (2024)
por: Leite, Clayton, et al.
Publicado: (2024)
Two tales for a geometric Jensen--Shannon divergence
por: Nielsen, Frank
Publicado: (2025)
por: Nielsen, Frank
Publicado: (2025)
Inexact calculus of variations on the hyperspherical tangent bundle and its connections to the attention mechanism
por: Gracyk, Andrew
Publicado: (2025)
por: Gracyk, Andrew
Publicado: (2025)
Scaling Federated Linear Contextual Bandits via Sketching
por: Yang, Hantao, et al.
Publicado: (2026)
por: Yang, Hantao, et al.
Publicado: (2026)
Sink vs. diagonal patterns as mechanisms for attention switch and oversmoothing prevention
por: Súkeník, Peter, et al.
Publicado: (2026)
por: Súkeník, Peter, et al.
Publicado: (2026)
On the Computational Hardness of Transformers
por: Saha, Barna, et al.
Publicado: (2026)
por: Saha, Barna, et al.
Publicado: (2026)
Mapping of attention mechanisms to a generalized Potts model
por: Rende, Riccardo, et al.
Publicado: (2023)
por: Rende, Riccardo, et al.
Publicado: (2023)
Invertible Memory Flow Networks
por: Zerihun, Liyu, et al.
Publicado: (2026)
por: Zerihun, Liyu, et al.
Publicado: (2026)
Locality Preserving Markovian Transition for Instance Retrieval
por: Luo, Jifei, et al.
Publicado: (2025)
por: Luo, Jifei, et al.
Publicado: (2025)
Lower bounds for one-layer transformers that compute parity
por: Hsu, Daniel
Publicado: (2026)
por: Hsu, Daniel
Publicado: (2026)
Self-attentive Transformer for Fast and Accurate Postprocessing of Temperature and Wind Speed Forecasts
por: Van Poecke, Aaron, et al.
Publicado: (2024)
por: Van Poecke, Aaron, et al.
Publicado: (2024)
Ejemplares similares
-
Fixed Universal Transformers
por: Liu, Jingwen, et al.
Publicado: (2026) -
Transformers, parallel computation, and logarithmic depth
por: Sanford, Clayton, et al.
Publicado: (2024) -
One-layer transformers fail to solve the induction heads task
por: Sanford, Clayton, et al.
Publicado: (2024) -
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
por: Mousavi-Hosseini, Alireza, et al.
Publicado: (2025) -
A multi-source data power load forecasting method using attention mechanism-based parallel cnn-gru
por: Min, Chao, et al.
Publicado: (2024)