:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Barnfield, Nicholas, Cui, Hugo, Lu, Yue M.
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Machine Learning
Online-Zugang:	https://arxiv.org/abs/2509.25153
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval
von: Barnfield, Nicholas, et al.
Veröffentlicht: (2026)

Multi-layer Cross-attention is Provably Optimal for Multi-modal In-context Learning
von: Barnfield, Nicholas, et al.
Veröffentlicht: (2026)

Asymptotic Theory of Iterated Empirical Risk Minimization, with Applications to Active Learning
von: Cui, Hugo, et al.
Veröffentlicht: (2026)

High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global Updates
von: Lu, Fred, et al.
Veröffentlicht: (2024)

Interpreting Attention Layer Outputs with Sparse Autoencoders
von: Kissane, Connor, et al.
Veröffentlicht: (2024)

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
von: Jo, Dongwon, et al.
Veröffentlicht: (2026)

STS: Efficient Sparse Attention with Speculative Token Sparsity
von: Xu, Ceyu, et al.
Veröffentlicht: (2026)

Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction
von: He, Mutian, et al.
Veröffentlicht: (2025)

STILL: Selecting Tokens for Intra-Layer Hybrid Attention to Linearize LLMs
von: Meng, Weikang, et al.
Veröffentlicht: (2026)

A solvable model of learning generative diffusion: theory and insights
von: Cui, Hugo, et al.
Veröffentlicht: (2025)

Infinite-Width Limit of a Single Attention Layer: Analysis via Tensor Programs
von: Sakai, Mana, et al.
Veröffentlicht: (2025)

Bridging the Dimensional Chasm: Uncover Layer-wise Dimensional Reduction in Transformers through Token Correlation
von: Song, Zhuo-Yang, et al.
Veröffentlicht: (2025)

Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks
von: Rangriz, Parsa
Veröffentlicht: (2025)

Classifier Language Models: Unifying Sparse Finetuning and Adaptive Tokenization for Specialized Classification Tasks
von: Krishnan, Adit, et al.
Veröffentlicht: (2025)

Neighbor Embedding for High-Dimensional Sparse Poisson Data
von: Mudrik, Noga, et al.
Veröffentlicht: (2026)

Learning to Predict, Discover, and Reason in High-Dimensional Event Sequences
von: Math, Hugo
Veröffentlicht: (2026)

Gradient Boosting within a Single Attention Layer
von: Sargolzaei, Saleh
Veröffentlicht: (2026)

Sparse Autoencoder Features for Classifications and Transferability
von: Gallifant, Jack, et al.
Veröffentlicht: (2025)

DELTA: Dynamic Layer-Aware Token Attention for Efficient Long-Context Reasoning
von: Zarch, Hossein Entezari, et al.
Veröffentlicht: (2025)

AdaSplash-2: Faster Differentiable Sparse Attention
von: Gonçalves, Nuno, et al.
Veröffentlicht: (2026)

High-dimensional learning of narrow neural networks
von: Cui, Hugo
Veröffentlicht: (2024)

Hierarchical Sparse Representation Clustering for High-Dimensional Data Streams
von: Chen, Jie, et al.
Veröffentlicht: (2024)

High-Layer Attention Pruning with Rescaling
von: Liu, Songtao, et al.
Veröffentlicht: (2025)

Virtual Node Generation for Node Classification in Sparsely-Labeled Graphs
von: Cui, Hang, et al.
Veröffentlicht: (2024)

Dimensional Collapse in Transformer Attention Outputs: A Challenge for Sparse Dictionary Learning
von: Wang, Junxuan, et al.
Veröffentlicht: (2025)

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse
von: Bai, Yushi, et al.
Veröffentlicht: (2026)

Exploring the Precise Dynamics of Single-Layer GAN Models: Leveraging Multi-Feature Discriminators for High-Dimensional Subspace Learning
von: Bond, Andrew, et al.
Veröffentlicht: (2024)

Hardness of High-Dimensional Linear Classification
von: Munteanu, Alexander, et al.
Veröffentlicht: (2026)

Geometric Analysis of Token Selection in Multi-Head Attention
von: Mudarisov, Timur, et al.
Veröffentlicht: (2026)

Token Sample Complexity of Attention
von: Bohbot, Léa, et al.
Veröffentlicht: (2025)

ToMA: Token Merge with Attention for Diffusion Models
von: Lu, Wenbo, et al.
Veröffentlicht: (2025)

MaxPoolBERT: Enhancing BERT Classification via Layer- and Token-Wise Aggregation
von: Behrendt, Maike, et al.
Veröffentlicht: (2025)

SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
von: Wang, Hanrui, et al.
Veröffentlicht: (2020)

Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks
von: Arnaboldi, Luca, et al.
Veröffentlicht: (2025)

Solving Sparse \& High-Dimensional-Output Regression via Compression
von: Li, Renyuan, et al.
Veröffentlicht: (2024)

A Convergence Analysis of Approximate Message Passing with Non-Separable Functions and Applications to Multi-Class Classification
von: Çakmak, Burak, et al.
Veröffentlicht: (2024)

vAttention: Verified Sparse Attention
von: Desai, Aditya, et al.
Veröffentlicht: (2025)

Sparse Modelling for Feature Learning in High Dimensional Data
von: Neelam, Harish, et al.
Veröffentlicht: (2024)

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs
von: Nawrot, Piotr, et al.
Veröffentlicht: (2025)

High-dimensional Asymptotics of Denoising Autoencoders
von: Cui, Hugo, et al.
Veröffentlicht: (2023)