:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Zhang, Michael, Bhatia, Kush, Kumbong, Hermann, Ré, Christopher
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Machine Learning Computation and Language
Accesso online:	https://arxiv.org/abs/2402.04347
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates
di: Narayan, Avanika, et al.
Pubblicazione: (2024)

LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits
di: Zhou, Zikai, et al.
Pubblicazione: (2025)

Why Softmax Attention Outperforms Linear Attention
di: Deng, Yichuan, et al.
Pubblicazione: (2023)

Automated Rewards via LLM-Generated Progress Functions
di: Sarukkai, Vishnu, et al.
Pubblicazione: (2024)

Kimi Linear: An Expressive, Efficient Attention Architecture
di: Kimi Team, et al.
Pubblicazione: (2025)

The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought
di: Brösamle, Moritz, et al.
Pubblicazione: (2026)

Stochastic Attention: Connectome-Inspired Randomized Routing for Expressive Linear-Time Attention
di: Jin, Zehao, et al.
Pubblicazione: (2026)

RAM-Net: Expressive Linear Attention with Selectively Addressable Memory
di: Xiao, Kaicheng, et al.
Pubblicazione: (2026)

Scalable-Softmax Is Superior for Attention
di: Nakanishi, Ken M.
Pubblicazione: (2025)

Softmax Attention with Constant Cost per Token
di: Heinsen, Franz A.
Pubblicazione: (2024)

Forgetting Transformer: Softmax Attention with a Forget Gate
di: Lin, Zhixuan, et al.
Pubblicazione: (2025)

Evaluating the fairness of task-adaptive pretraining on unlabeled test data before few-shot text classification
di: Dubey, Kush
Pubblicazione: (2024)

In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness
di: Collins, Liam, et al.
Pubblicazione: (2024)

LoLCATs: On Low-Rank Linearizing of Large Language Models
di: Zhang, Michael, et al.
Pubblicazione: (2024)

More Expressive Attention with Negative Weights
di: Lv, Ang, et al.
Pubblicazione: (2024)

To Softmax, or not to Softmax: that is the question when applying Active Learning for Transformer Models
di: Gonsior, Julius, et al.
Pubblicazione: (2022)

Softmax Transformers are Turing-Complete
di: Jiang, Hongjian, et al.
Pubblicazione: (2025)

Taipan: Efficient and Expressive State Space Language Models with Selective Attention
di: Van Nguyen, Chien, et al.
Pubblicazione: (2024)

The Information Geometry of Softmax: Probing and Steering
di: Park, Kiho, et al.
Pubblicazione: (2026)

GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling
di: Dadgarnia, Alireza, et al.
Pubblicazione: (2026)

Beyond Mimicry to Contextual Guidance: Knowledge Distillation for Interactive AI
di: Wang, Tong, et al.
Pubblicazione: (2024)

SEA: Sparse Linear Attention with Estimated Attention Mask
di: Lee, Heejun, et al.
Pubblicazione: (2023)

Linear Attention Sequence Parallelism
di: Sun, Weigao, et al.
Pubblicazione: (2024)

Higher-order Linear Attention
di: Zhang, Yifan, et al.
Pubblicazione: (2025)

On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective
di: Mongaras, Gabriel, et al.
Pubblicazione: (2025)

Trading Complexity for Expressivity Through Structured Generalized Linear Token Mixing
di: Fagnou, Erwan, et al.
Pubblicazione: (2026)

Scaling Linear Attention with Sparse State Expansion
di: Pan, Yuqi, et al.
Pubblicazione: (2025)

HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation
di: Kumbong, Hermann, et al.
Pubblicazione: (2025)

Simple linear attention language models balance the recall-throughput tradeoff
di: Arora, Simran, et al.
Pubblicazione: (2024)

Aioli: A Unified Optimization Framework for Language Model Data Mixing
di: Chen, Mayee F., et al.
Pubblicazione: (2024)

Theoretical Constraints on the Expressive Power of $\mathsf{RoPE}$-based Tensor Attention Transformers
di: Li, Xiaoyu, et al.
Pubblicazione: (2024)

Alleviating Forgetfulness of Linear Attention by Hybrid Sparse Attention and Contextualized Learnable Token Eviction
di: He, Mutian, et al.
Pubblicazione: (2025)

Neural Attention Search Linear: Towards Adaptive Token-Level Hybrid Attention Models
di: Deng, Difan, et al.
Pubblicazione: (2026)

Gated Linear Attention Transformers with Hardware-Efficient Training
di: Yang, Songlin, et al.
Pubblicazione: (2023)

Counting Like Transformers: Compiling Temporal Counting Logic Into Softmax Transformers
di: Yang, Andy, et al.
Pubblicazione: (2024)

The Expressive Capacity of State Space Models: A Formal Language Perspective
di: Sarrof, Yash, et al.
Pubblicazione: (2024)

Unifying Linear-Time Attention via Latent Probabilistic Modelling
di: Dolga, Rares, et al.
Pubblicazione: (2024)

Transformer Based Linear Attention with Optimized GPU Kernel Implementation
di: Gerami, Armin, et al.
Pubblicazione: (2025)

LoLA: Low-Rank Linear Attention With Sparse Caching
di: McDermott, Luke, et al.
Pubblicazione: (2025)

Probing Geometry of Next Token Prediction Using Cumulant Expansion of the Softmax Entropy
di: Viswanathan, Karthik, et al.
Pubblicazione: (2025)