Guardado en:
| Autores principales: | Cheng, Xiang, Chen, Yuxin, Sra, Suvrit |
|---|---|
| Formato: | Preprint |
| Publicado: |
2023
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2312.06528 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Graph Transformers Dream of Electric Flow
por: Cheng, Xiang, et al.
Publicado: (2024)
por: Cheng, Xiang, et al.
Publicado: (2024)
Linearly Convergent Algorithms for Nonsmooth Problems with Unknown Smooth Pieces
por: Zhang, Zhe, et al.
Publicado: (2025)
por: Zhang, Zhe, et al.
Publicado: (2025)
Toward generalizable learning of all (linear) first-order methods via memory augmented Transformers
por: Dutta, Sanchayan, et al.
Publicado: (2024)
por: Dutta, Sanchayan, et al.
Publicado: (2024)
Efficient Sampling on Riemannian Manifolds via Langevin MCMC
por: Cheng, Xiang, et al.
Publicado: (2024)
por: Cheng, Xiang, et al.
Publicado: (2024)
Riemannian Bilevel Optimization
por: Dutta, Sanchayan, et al.
Publicado: (2024)
por: Dutta, Sanchayan, et al.
Publicado: (2024)
Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part I
por: Tian, Yi, et al.
Publicado: (2022)
por: Tian, Yi, et al.
Publicado: (2022)
Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II
por: Tian, Yi, et al.
Publicado: (2026)
por: Tian, Yi, et al.
Publicado: (2026)
Linear attention is (maybe) all you need (to understand transformer optimization)
por: Ahn, Kwangjun, et al.
Publicado: (2023)
por: Ahn, Kwangjun, et al.
Publicado: (2023)
Trees to Flows and Back: Unifying Decision Trees and Diffusion Models
por: Ramachandran, Sai Niranjan, et al.
Publicado: (2026)
por: Ramachandran, Sai Niranjan, et al.
Publicado: (2026)
Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture
por: Hou, Yikun, et al.
Publicado: (2025)
por: Hou, Yikun, et al.
Publicado: (2025)
First-Order Methods for Linearly Constrained Bilevel Optimization
por: Kornowski, Guy, et al.
Publicado: (2024)
por: Kornowski, Guy, et al.
Publicado: (2024)
How to escape sharp minima with random perturbations
por: Ahn, Kwangjun, et al.
Publicado: (2023)
por: Ahn, Kwangjun, et al.
Publicado: (2023)
A projection-based framework for gradient-free and parallel learning
por: Bergmeister, Andreas, et al.
Publicado: (2025)
por: Bergmeister, Andreas, et al.
Publicado: (2025)
Cross-fluctuation phase transitions reveal sampling dynamics in diffusion models
por: Ramachandran, Sai Niranjan, et al.
Publicado: (2025)
por: Ramachandran, Sai Niranjan, et al.
Publicado: (2025)
Distributed Gradient Descent for Functional Learning
por: Yu, Zhan, et al.
Publicado: (2023)
por: Yu, Zhan, et al.
Publicado: (2023)
Revisiting Frank-Wolfe for Structured Nonconvex Optimization
por: Maskan, Hoomaan, et al.
Publicado: (2025)
por: Maskan, Hoomaan, et al.
Publicado: (2025)
Tight Generalization Bounds for Noiseless Inverse Optimization
por: Fatemi, Pouria, et al.
Publicado: (2026)
por: Fatemi, Pouria, et al.
Publicado: (2026)
Continuum Transformers Perform In-Context Learning by Operator Gradient Descent
por: Mishra, Abhiti, et al.
Publicado: (2025)
por: Mishra, Abhiti, et al.
Publicado: (2025)
Do pretrained Transformers Learn In-Context by Gradient Descent?
por: Shen, Lingfeng, et al.
Publicado: (2023)
por: Shen, Lingfeng, et al.
Publicado: (2023)
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
por: Huang, Jianhao, et al.
Publicado: (2025)
por: Huang, Jianhao, et al.
Publicado: (2025)
Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression
por: Jiang, Jiarui, et al.
Publicado: (2025)
por: Jiang, Jiarui, et al.
Publicado: (2025)
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
por: Gatmiry, Khashayar, et al.
Publicado: (2024)
por: Gatmiry, Khashayar, et al.
Publicado: (2024)
Gradient Descent Fails to Learn High-frequency Functions and Modular Arithmetic
por: Takhanov, Rustem, et al.
Publicado: (2023)
por: Takhanov, Rustem, et al.
Publicado: (2023)
Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning
por: Xie, Zixuan, et al.
Publicado: (2026)
por: Xie, Zixuan, et al.
Publicado: (2026)
Anytime Acceleration of Gradient Descent
por: Zhang, Zihan, et al.
Publicado: (2024)
por: Zhang, Zihan, et al.
Publicado: (2024)
Geodesic Gradient Descent: A Generic and Learning-rate-free Optimizer on Objective Function-induced Manifolds
por: Hu, Liwei, et al.
Publicado: (2026)
por: Hu, Liwei, et al.
Publicado: (2026)
Learning High-Dimensional Parity Functions with Product Networks using Gradient Descent
por: Larue, Guillaume, et al.
Publicado: (2026)
por: Larue, Guillaume, et al.
Publicado: (2026)
The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent
por: Dandi, Yatin, et al.
Publicado: (2025)
por: Dandi, Yatin, et al.
Publicado: (2025)
One for All: A Non-Linear Transformer can Enable Cross-Domain Generalization for In-Context Reinforcement Learning
por: He, Bowen, et al.
Publicado: (2026)
por: He, Bowen, et al.
Publicado: (2026)
Efficient Search for Customized Activation Functions with Gradient Descent
por: Strack, Lukas, et al.
Publicado: (2024)
por: Strack, Lukas, et al.
Publicado: (2024)
Functional Central Limit Theorem for Stochastic Gradient Descent
por: Flamand, Kessang, et al.
Publicado: (2026)
por: Flamand, Kessang, et al.
Publicado: (2026)
Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent
por: Zhang, Chenyang, et al.
Publicado: (2026)
por: Zhang, Chenyang, et al.
Publicado: (2026)
The Initialization Determines Whether In-Context Learning Is Gradient Descent
por: Xie, Shifeng, et al.
Publicado: (2025)
por: Xie, Shifeng, et al.
Publicado: (2025)
Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations
por: Dong, Yuxin, et al.
Publicado: (2025)
por: Dong, Yuxin, et al.
Publicado: (2025)
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
por: Aksenov, Yaroslav, et al.
Publicado: (2024)
por: Aksenov, Yaroslav, et al.
Publicado: (2024)
How Transformers Learn Causal Structure with Gradient Descent
por: Nichani, Eshaan, et al.
Publicado: (2024)
por: Nichani, Eshaan, et al.
Publicado: (2024)
On the Convergence of Gradient Descent on Learning Transformers with Residual Connections
por: Qin, Zhen, et al.
Publicado: (2025)
por: Qin, Zhen, et al.
Publicado: (2025)
The Multi-Block DC Function Class: Theory, Algorithms, and Applications
por: Fatemi, Pouria, et al.
Publicado: (2026)
por: Fatemi, Pouria, et al.
Publicado: (2026)
Unraveling the Gradient Descent Dynamics of Transformers
por: Song, Bingqing, et al.
Publicado: (2024)
por: Song, Bingqing, et al.
Publicado: (2024)
Computing Brascamp-Lieb Constants through the lens of Thompson Geometry
por: Weber, Melanie, et al.
Publicado: (2022)
por: Weber, Melanie, et al.
Publicado: (2022)
Ejemplares similares
-
Graph Transformers Dream of Electric Flow
por: Cheng, Xiang, et al.
Publicado: (2024) -
Linearly Convergent Algorithms for Nonsmooth Problems with Unknown Smooth Pieces
por: Zhang, Zhe, et al.
Publicado: (2025) -
Toward generalizable learning of all (linear) first-order methods via memory augmented Transformers
por: Dutta, Sanchayan, et al.
Publicado: (2024) -
Efficient Sampling on Riemannian Manifolds via Langevin MCMC
por: Cheng, Xiang, et al.
Publicado: (2024) -
Riemannian Bilevel Optimization
por: Dutta, Sanchayan, et al.
Publicado: (2024)