Saved in:
| Main Authors: | Furuya, Takashi, de Hoop, Maarten V., Peyré, Gabriel |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.01367 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Transformers through the lens of support-preserving maps between measures
by: Furuya, Takashi, et al.
Published: (2025)
by: Furuya, Takashi, et al.
Published: (2025)
Training Infinitely Deep and Wide Transformers
by: Barboni, Raphaël, et al.
Published: (2026)
by: Barboni, Raphaël, et al.
Published: (2026)
Can neural operators always be continuously discretized?
by: Furuya, Takashi, et al.
Published: (2024)
by: Furuya, Takashi, et al.
Published: (2024)
Function graph transformers universally approximate operators between function spaces
by: Furuya, Takashi, et al.
Published: (2026)
by: Furuya, Takashi, et al.
Published: (2026)
Optimal Transport for Machine Learners
by: Peyré, Gabriel
Published: (2025)
by: Peyré, Gabriel
Published: (2025)
In-context Continual Learning Assisted by an External Continual Learner
by: Momeni, Saleh, et al.
Published: (2024)
by: Momeni, Saleh, et al.
Published: (2024)
Towards Understanding the Universality of Transformers for Next-Token Prediction
by: Sander, Michael E., et al.
Published: (2024)
by: Sander, Michael E., et al.
Published: (2024)
Mixture of Experts Softens the Curse of Dimensionality in Operator Learning
by: Kratsios, Anastasis, et al.
Published: (2024)
by: Kratsios, Anastasis, et al.
Published: (2024)
Plain Transformers Can be Powerful Graph Learners
by: Ma, Liheng, et al.
Published: (2025)
by: Ma, Liheng, et al.
Published: (2025)
Auto-Regressive Next-Token Predictors are Universal Learners
by: Malach, Eran
Published: (2023)
by: Malach, Eran
Published: (2023)
Out-of-distributional risk bounds for neural operators with applications to the Helmholtz equation
by: Benitez, J. Antonio Lara, et al.
Published: (2023)
by: Benitez, J. Antonio Lara, et al.
Published: (2023)
Semialgebraic Neural Networks: From roots to representations
by: Mis, S. David, et al.
Published: (2025)
by: Mis, S. David, et al.
Published: (2025)
Is In-Context Universality Enough? MLPs are Also Universal In-Context
by: Kratsios, Anastasis, et al.
Published: (2025)
by: Kratsios, Anastasis, et al.
Published: (2025)
RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval
by: Wen, Kaiyue, et al.
Published: (2024)
by: Wen, Kaiyue, et al.
Published: (2024)
Linking In-context Learning in Transformers to Human Episodic Memory
by: Ji-An, Li, et al.
Published: (2024)
by: Ji-An, Li, et al.
Published: (2024)
Curse of High Dimensionality Issue in Transformer for Long-context Modeling
by: Zhang, Shuhai, et al.
Published: (2025)
by: Zhang, Shuhai, et al.
Published: (2025)
Energy-Based Transformers are Scalable Learners and Thinkers
by: Gladstone, Alexi, et al.
Published: (2025)
by: Gladstone, Alexi, et al.
Published: (2025)
Transformative or Conservative? Conservation laws for ResNets and Transformers
by: Marcotte, Sibylle, et al.
Published: (2025)
by: Marcotte, Sibylle, et al.
Published: (2025)
Breaking through the learning plateaus of in-context learning in Transformer
by: Fu, Jingwen, et al.
Published: (2023)
by: Fu, Jingwen, et al.
Published: (2023)
Language Models are Symbolic Learners in Arithmetic
by: Deng, Chunyuan, et al.
Published: (2024)
by: Deng, Chunyuan, et al.
Published: (2024)
On the Fragility of Active Learners for Text Classification
by: Ghose, Abhishek, et al.
Published: (2024)
by: Ghose, Abhishek, et al.
Published: (2024)
Robust Sublinear Convergence Rates for Iterative Bregman Projections
by: Peyré, Gabriel
Published: (2026)
by: Peyré, Gabriel
Published: (2026)
Mixtures of In-Context Learners
by: Hong, Giwon, et al.
Published: (2024)
by: Hong, Giwon, et al.
Published: (2024)
The broader spectrum of in-context learning
by: Lampinen, Andrew Kyle, et al.
Published: (2024)
by: Lampinen, Andrew Kyle, et al.
Published: (2024)
Are Large Language Models Good Temporal Graph Learners?
by: Huang, Shenyang, et al.
Published: (2025)
by: Huang, Shenyang, et al.
Published: (2025)
Barriers to Universal Reasoning With Transformers (And How to Overcome Them)
by: Kraus, Oliver, et al.
Published: (2026)
by: Kraus, Oliver, et al.
Published: (2026)
Boosting Masked ECG-Text Auto-Encoders as Discriminative Learners
by: Pham, Hung Manh, et al.
Published: (2024)
by: Pham, Hung Manh, et al.
Published: (2024)
In-context Learning in Presence of Spurious Correlations
by: Harutyunyan, Hrayr, et al.
Published: (2024)
by: Harutyunyan, Hrayr, et al.
Published: (2024)
CausalLM is not optimal for in-context learning
by: Ding, Nan, et al.
Published: (2023)
by: Ding, Nan, et al.
Published: (2023)
Guideline Learning for In-context Information Extraction
by: Pang, Chaoxu, et al.
Published: (2023)
by: Pang, Chaoxu, et al.
Published: (2023)
In-context Learning and Gradient Descent Revisited
by: Deutch, Gilad, et al.
Published: (2023)
by: Deutch, Gilad, et al.
Published: (2023)
Towards Modeling Learner Performance with Large Language Models
by: Neshaei, Seyed Parsa, et al.
Published: (2024)
by: Neshaei, Seyed Parsa, et al.
Published: (2024)
An Evolved Universal Transformer Memory
by: Cetin, Edoardo, et al.
Published: (2024)
by: Cetin, Edoardo, et al.
Published: (2024)
Looking Beyond The Top-1: Transformers Determine Top Tokens In Order
by: Lioubashevski, Daria, et al.
Published: (2024)
by: Lioubashevski, Daria, et al.
Published: (2024)
Muon Dynamics as a Spectral Wasserstein Flow
by: Peyré, Gabriel
Published: (2026)
by: Peyré, Gabriel
Published: (2026)
Optimal and Diffusion Transports in Machine Learning
by: Peyré, Gabriel
Published: (2025)
by: Peyré, Gabriel
Published: (2025)
Re-examining learning linear functions in context
by: Naim, Omar, et al.
Published: (2024)
by: Naim, Omar, et al.
Published: (2024)
Associative-State Universal Transformers: Sparse Retrieval Meets Structured Recurrence
by: Xiao, Liu
Published: (2026)
by: Xiao, Liu
Published: (2026)
LLMs Are In-Context Bandit Reinforcement Learners
by: Monea, Giovanni, et al.
Published: (2024)
by: Monea, Giovanni, et al.
Published: (2024)
Active Learners as Efficient PRP Rerankers
by: Paschmann, Jeremías Figueiredo, et al.
Published: (2026)
by: Paschmann, Jeremías Figueiredo, et al.
Published: (2026)
Similar Items
-
Transformers through the lens of support-preserving maps between measures
by: Furuya, Takashi, et al.
Published: (2025) -
Training Infinitely Deep and Wide Transformers
by: Barboni, Raphaël, et al.
Published: (2026) -
Can neural operators always be continuously discretized?
by: Furuya, Takashi, et al.
Published: (2024) -
Function graph transformers universally approximate operators between function spaces
by: Furuya, Takashi, et al.
Published: (2026) -
Optimal Transport for Machine Learners
by: Peyré, Gabriel
Published: (2025)