:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Bombari, Simone, Mondelli, Marco
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Machine Learning Computation and Language
Online-Zugang:	https://arxiv.org/abs/2402.02969
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

How Spurious Features Are Memorized: Precise Analysis for Random and NTK Features
von: Bombari, Simone, et al.
Veröffentlicht: (2023)

A Law of Data Reconstruction for Random Features (and Beyond)
von: Iurada, Leonardo, et al.
Veröffentlicht: (2025)

Spurious Correlations in High Dimensional Regression: The Roles of Regularization, Simplicity Bias and Over-Parameterization
von: Bombari, Simone, et al.
Veröffentlicht: (2025)

Privacy for Free in the Overparameterized Regime
von: Bombari, Simone, et al.
Veröffentlicht: (2024)

High-Dimensional Private Linear Regression with Optimal Rates
von: Bombari, Simone, et al.
Veröffentlicht: (2025)

Attention with Trained Embeddings Provably Selects Important Tokens
von: Wu, Diyuan, et al.
Veröffentlicht: (2025)

Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
von: He, Zhengfu, et al.
Veröffentlicht: (2025)

Improving Rare Word Translation With Dictionaries and Attention Masking
von: Sible, Kenneth J., et al.
Veröffentlicht: (2024)

Improved Scaling Laws via Weak-to-Strong Generalization in Random Feature Ridge Regression
von: Wu, Diyuan, et al.
Veröffentlicht: (2026)

Feature Resemblance: Towards a Theoretical Understanding of Analogical Reasoning in Transformers
von: Xu, Ruichen, et al.
Veröffentlicht: (2026)

LayerBoost: Layer-Aware Attention Reduction for Efficient LLMs
von: Souibgui, Mohamed Ali, et al.
Veröffentlicht: (2026)

Paying Attention to Facts: Quantifying the Knowledge Capacity of Attention Layers
von: Wong, Liang Ze
Veröffentlicht: (2025)

High-Layer Attention Pruning with Rescaling
von: Liu, Songtao, et al.
Veröffentlicht: (2025)

Guided Perturbation Sensitivity (GPS): Detecting Adversarial Text via Embedding Stability and Word Importance
von: Tuck, Bryan E., et al.
Veröffentlicht: (2025)

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse
von: Bai, Yushi, et al.
Veröffentlicht: (2026)

Pay Less Attention to Function Words for Free Robustness of Vision-Language Models
von: Tian, Qiwei, et al.
Veröffentlicht: (2025)

Multi-Layer Attention is the Amplifier of Demonstration Effectiveness
von: Wang, Dingzirui, et al.
Veröffentlicht: (2025)

Multi-Relational Hyperbolic Word Embeddings from Natural Language Definitions
von: Valentino, Marco, et al.
Veröffentlicht: (2023)

Nectar: Neural Estimation of Cached-Token Attention via Regression
von: Monteiro, João, et al.
Veröffentlicht: (2026)

Neural Collapse Beyond the Unconstrained Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime
von: Wu, Diyuan, et al.
Veröffentlicht: (2025)

Stochastic Attention: Connectome-Inspired Randomized Routing for Expressive Linear-Time Attention
von: Jin, Zehao, et al.
Veröffentlicht: (2026)

Studying the Korean Word-Chain Game with RLVR: Mitigating Reward Conflicts via Curriculum Learning
von: Rho, Donghwan
Veröffentlicht: (2025)

Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers
von: Chen, Lei, et al.
Veröffentlicht: (2024)

Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers
von: Bozic, Vukasin, et al.
Veröffentlicht: (2023)

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
von: Zhu, Hanlin, et al.
Veröffentlicht: (2024)

Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers
von: Musat, Tiberiu
Veröffentlicht: (2024)

CLAA: Cross-Layer Attention Aggregation for Accelerating LLM Prefill
von: McDanel, Bradley, et al.
Veröffentlicht: (2026)

Extracting Rule-based Descriptions of Attention Features in Transformers
von: Friedman, Dan, et al.
Veröffentlicht: (2025)

SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget
von: Wang, Zihao, et al.
Veröffentlicht: (2024)

Towards Understanding Steering Strength
von: Taimeskhanov, Magamed, et al.
Veröffentlicht: (2026)

Decomposing Attention To Find Context-Sensitive Neurons
von: Gibson, Alex
Veröffentlicht: (2025)

Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse
von: Fu, Zizhuo, et al.
Veröffentlicht: (2026)

Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification
von: ElNokrashy, Muhammad, et al.
Veröffentlicht: (2022)

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
von: Brandon, William, et al.
Veröffentlicht: (2024)

Wave-PDE Nets: Trainable Wave-Equation Layers as an Alternative to Attention
von: Vejendla, Harshil
Veröffentlicht: (2025)

Closed-Form Training Dynamics Reveal Learned Features and Linear Structure in Word2Vec-like Models
von: Karkada, Dhruva, et al.
Veröffentlicht: (2025)

Routing Absorption in Sparse Attention: Why Random Gates Are Hard to Beat
von: Aquino-Michaels, Keston
Veröffentlicht: (2026)

Is Random Attention Sufficient for Sequence Modeling? Disentangling Trainable Components in the Transformer
von: Dong, Yihe, et al.
Veröffentlicht: (2025)

Neural Attention Search Linear: Towards Adaptive Token-Level Hybrid Attention Models
von: Deng, Difan, et al.
Veröffentlicht: (2026)

CAST: Compositional Analysis via Spectral Tracking for Understanding Transformer Layer Functions
von: Fu, Zihao, et al.
Veröffentlicht: (2025)