:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Cheng, Xiang, Chen, Yuxin, Sra, Suvrit
Formato:	Preprint
Publicado:	2023
Materias:	Machine Learning
Acceso en línea:	https://arxiv.org/abs/2312.06528
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Graph Transformers Dream of Electric Flow
por: Cheng, Xiang, et al.
Publicado: (2024)

Linearly Convergent Algorithms for Nonsmooth Problems with Unknown Smooth Pieces
por: Zhang, Zhe, et al.
Publicado: (2025)

Toward generalizable learning of all (linear) first-order methods via memory augmented Transformers
por: Dutta, Sanchayan, et al.
Publicado: (2024)

Efficient Sampling on Riemannian Manifolds via Langevin MCMC
por: Cheng, Xiang, et al.
Publicado: (2024)

Riemannian Bilevel Optimization
por: Dutta, Sanchayan, et al.
Publicado: (2024)

Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part I
por: Tian, Yi, et al.
Publicado: (2022)

Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II
por: Tian, Yi, et al.
Publicado: (2026)

Linear attention is (maybe) all you need (to understand transformer optimization)
por: Ahn, Kwangjun, et al.
Publicado: (2023)

Trees to Flows and Back: Unifying Decision Trees and Diffusion Models
por: Ramachandran, Sai Niranjan, et al.
Publicado: (2026)

Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture
por: Hou, Yikun, et al.
Publicado: (2025)

First-Order Methods for Linearly Constrained Bilevel Optimization
por: Kornowski, Guy, et al.
Publicado: (2024)

How to escape sharp minima with random perturbations
por: Ahn, Kwangjun, et al.
Publicado: (2023)

A projection-based framework for gradient-free and parallel learning
por: Bergmeister, Andreas, et al.
Publicado: (2025)

Cross-fluctuation phase transitions reveal sampling dynamics in diffusion models
por: Ramachandran, Sai Niranjan, et al.
Publicado: (2025)

Distributed Gradient Descent for Functional Learning
por: Yu, Zhan, et al.
Publicado: (2023)

Revisiting Frank-Wolfe for Structured Nonconvex Optimization
por: Maskan, Hoomaan, et al.
Publicado: (2025)

Tight Generalization Bounds for Noiseless Inverse Optimization
por: Fatemi, Pouria, et al.
Publicado: (2026)

Continuum Transformers Perform In-Context Learning by Operator Gradient Descent
por: Mishra, Abhiti, et al.
Publicado: (2025)

Do pretrained Transformers Learn In-Context by Gradient Descent?
por: Shen, Lingfeng, et al.
Publicado: (2023)

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
por: Huang, Jianhao, et al.
Publicado: (2025)

Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression
por: Jiang, Jiarui, et al.
Publicado: (2025)

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
por: Gatmiry, Khashayar, et al.
Publicado: (2024)

Gradient Descent Fails to Learn High-frequency Functions and Modular Arithmetic
por: Takhanov, Rustem, et al.
Publicado: (2023)

Beyond Linear Attention: Softmax Transformers Implement In-Context Reinforcement Learning
por: Xie, Zixuan, et al.
Publicado: (2026)

Anytime Acceleration of Gradient Descent
por: Zhang, Zihan, et al.
Publicado: (2024)

Geodesic Gradient Descent: A Generic and Learning-rate-free Optimizer on Objective Function-induced Manifolds
por: Hu, Liwei, et al.
Publicado: (2026)

Learning High-Dimensional Parity Functions with Product Networks using Gradient Descent
por: Larue, Guillaume, et al.
Publicado: (2026)

The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent
por: Dandi, Yatin, et al.
Publicado: (2025)

One for All: A Non-Linear Transformer can Enable Cross-Domain Generalization for In-Context Reinforcement Learning
por: He, Bowen, et al.
Publicado: (2026)

Efficient Search for Customized Activation Functions with Gradient Descent
por: Strack, Lukas, et al.
Publicado: (2024)

Functional Central Limit Theorem for Stochastic Gradient Descent
por: Flamand, Kessang, et al.
Publicado: (2026)

Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent
por: Zhang, Chenyang, et al.
Publicado: (2026)

The Initialization Determines Whether In-Context Learning Is Gradient Descent
por: Xie, Shifeng, et al.
Publicado: (2025)

Understanding Task Vectors in In-Context Learning: Emergence, Functionality, and Limitations
por: Dong, Yuxin, et al.
Publicado: (2025)

Linear Transformers with Learnable Kernel Functions are Better In-Context Models
por: Aksenov, Yaroslav, et al.
Publicado: (2024)

How Transformers Learn Causal Structure with Gradient Descent
por: Nichani, Eshaan, et al.
Publicado: (2024)

On the Convergence of Gradient Descent on Learning Transformers with Residual Connections
por: Qin, Zhen, et al.
Publicado: (2025)

The Multi-Block DC Function Class: Theory, Algorithms, and Applications
por: Fatemi, Pouria, et al.
Publicado: (2026)

Unraveling the Gradient Descent Dynamics of Transformers
por: Song, Bingqing, et al.
Publicado: (2024)

Computing Brascamp-Lieb Constants through the lens of Thompson Geometry
por: Weber, Melanie, et al.
Publicado: (2022)