:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Liu, Jingwen, Yu, Hantao, Sanford, Clayton, Andoni, Alexandr, Hsu, Daniel
Formato:	Preprint
Publicado:	2025
Materias:	Machine Learning
Acceso en línea:	https://arxiv.org/abs/2509.09001
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Fixed Universal Transformers
por: Liu, Jingwen, et al.
Publicado: (2026)

Transformers, parallel computation, and logarithmic depth
por: Sanford, Clayton, et al.
Publicado: (2024)

One-layer transformers fail to solve the induction heads task
por: Sanford, Clayton, et al.
Publicado: (2024)

When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective
por: Mousavi-Hosseini, Alireza, et al.
Publicado: (2025)

A multi-source data power load forecasting method using attention mechanism-based parallel cnn-gru
por: Min, Chao, et al.
Publicado: (2024)

Group-wise oracle-efficient algorithms for online multi-group learning
por: Deng, Samuel, et al.
Publicado: (2024)

Group-realizable multi-group learning by minimizing empirical risk
por: Ardeshir, Navid, et al.
Publicado: (2026)

Lost in Tokenization: Fundamental Trade-offs in Graph Tokenization for Transformers
por: Bechler-Speicher, Maya, et al.
Publicado: (2026)

Easy attention: A simple attention mechanism for temporal predictions with transformers
por: Sanchis-Agudo, Marcial, et al.
Publicado: (2023)

Next-Token Prediction and Regret Minimization
por: Mohri, Mehryar, et al.
Publicado: (2026)

Approximation of relation functions and attention mechanisms
por: Altabaa, Awni, et al.
Publicado: (2024)

Flow Straight and Fast in Hilbert Space: Functional Rectified Flow
por: Zhang, Jianxin, et al.
Publicado: (2025)

Best of Both Worlds: Advantages of Hybrid Graph Sequence Models
por: Behrouz, Ali, et al.
Publicado: (2024)

Two Heads Are Better than One: Simulating Large Transformers with Small Ones
por: Yu, Hantao, et al.
Publicado: (2025)

Reservoir observer enhanced with residual calibration and attention mechanism
por: Liu, Yichen, et al.
Publicado: (2026)

FMamba: Mamba based on Fast-attention for Multivariate Time-series Forecasting
por: Ma, Shusen, et al.
Publicado: (2024)

A foundation model with multi-variate parallel attention to generate neuronal activity
por: Carzaniga, Francesco, et al.
Publicado: (2025)

Relational inductive biases on attention mechanisms
por: Mijangos, Víctor, et al.
Publicado: (2025)

A First Guess is Rarely the Final Answer: Learning to Search in the Traveling Salesperson Problem
por: Garmendia, Andoni Irazusta
Publicado: (2026)

Fundamental Limitations on Subquadratic Alternatives to Transformers
por: Alman, Josh, et al.
Publicado: (2024)

Statistical-Computational Trade-offs for Density Estimation
por: Aamand, Anders, et al.
Publicado: (2024)

Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers
por: Yehudai, Gilad, et al.
Publicado: (2025)

Parent-Guided Semantic Reward Model (PGSRM): Embedding-Based Reward Functions for Reinforcement Learning of Transformer Language Models
por: Plashchinsky, Alexandr
Publicado: (2025)

Fast parallel sampling under isoperimetry
por: Anari, Nima, et al.
Publicado: (2024)

Understanding Transformer Reasoning Capabilities via Graph Algorithms
por: Sanford, Clayton, et al.
Publicado: (2024)

Tucker Attention: A generalization of approximate attention mechanisms
por: Klein, Timon, et al.
Publicado: (2026)

Redundant feature screening method for human activity recognition based on attention purification mechanism
por: Li, Xiaoyang, et al.
Publicado: (2025)

Implicit Bias and Fast Convergence Rates for Self-attention
por: Vasudeva, Bhavya, et al.
Publicado: (2024)

Hierarchical Motion Captioning Utilizing External Text Data Source
por: Leite, Clayton, et al.
Publicado: (2025)

Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing
por: Leite, Clayton, et al.
Publicado: (2024)

Two tales for a geometric Jensen--Shannon divergence
por: Nielsen, Frank
Publicado: (2025)

Inexact calculus of variations on the hyperspherical tangent bundle and its connections to the attention mechanism
por: Gracyk, Andrew
Publicado: (2025)

Scaling Federated Linear Contextual Bandits via Sketching
por: Yang, Hantao, et al.
Publicado: (2026)

Sink vs. diagonal patterns as mechanisms for attention switch and oversmoothing prevention
por: Súkeník, Peter, et al.
Publicado: (2026)

On the Computational Hardness of Transformers
por: Saha, Barna, et al.
Publicado: (2026)

Mapping of attention mechanisms to a generalized Potts model
por: Rende, Riccardo, et al.
Publicado: (2023)

Invertible Memory Flow Networks
por: Zerihun, Liyu, et al.
Publicado: (2026)

Locality Preserving Markovian Transition for Instance Retrieval
por: Luo, Jifei, et al.
Publicado: (2025)

Lower bounds for one-layer transformers that compute parity
por: Hsu, Daniel
Publicado: (2026)

Self-attentive Transformer for Fast and Accurate Postprocessing of Temperature and Wind Speed Forecasts
por: Van Poecke, Aaron, et al.
Publicado: (2024)