Saved in:
| Main Authors: | Xie, Shuo, Mohamadi, Mohamad Amin, Li, Zhiyuan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.08198 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization
by: Xie, Shuo, et al.
Published: (2024)
by: Xie, Shuo, et al.
Published: (2024)
Honesty over Accuracy: Trustworthy Language Models through Reinforced Hesitation
by: Mohamadi, Mohamad Amin, et al.
Published: (2025)
by: Mohamadi, Mohamad Amin, et al.
Published: (2025)
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition
by: Mohamadi, Mohamad Amin, et al.
Published: (2024)
by: Mohamadi, Mohamad Amin, et al.
Published: (2024)
Adaptive Preconditioners Trigger Loss Spikes in Adam
by: Bai, Zhiwei, et al.
Published: (2025)
by: Bai, Zhiwei, et al.
Published: (2025)
A Tale of Two Geometries: Adaptive Optimizers and Non-Euclidean Descent
by: Xie, Shuo, et al.
Published: (2025)
by: Xie, Shuo, et al.
Published: (2025)
Asymptotic Behavior of Adversarial Training Estimator under $\ell_\infty$-Perturbation
by: Xie, Yiling, et al.
Published: (2024)
by: Xie, Yiling, et al.
Published: (2024)
Improved Distribution Estimation in $\ell_\infty$
by: Cohen, Doron, et al.
Published: (2026)
by: Cohen, Doron, et al.
Published: (2026)
Structured Preconditioners in Adaptive Optimization: A Unified Analysis
by: Xie, Shuo, et al.
Published: (2025)
by: Xie, Shuo, et al.
Published: (2025)
Patch-wise Structural Loss for Time Series Forecasting
by: Kudrat, Dilfira, et al.
Published: (2025)
by: Kudrat, Dilfira, et al.
Published: (2025)
Dynamic Regret via Discounted-to-Dynamic Reduction with Applications to Curved Losses and Adam Optimizer
by: Xie, Yan-Feng, et al.
Published: (2026)
by: Xie, Yan-Feng, et al.
Published: (2026)
Beyond $\ell_2$-norm and $\ell_\infty$-norm: A Curvature-Inspired $\ell_p$-Norm Scheme for Deep Neural Networks
by: Xu, Jianhao, et al.
Published: (2026)
by: Xu, Jianhao, et al.
Published: (2026)
A Determinantal Approach to a Sharp $\ell^1-\ell^\infty-\ell^2$ Norm Inequality
by: Benitez, Jose Antonio Lara
Published: (2026)
by: Benitez, Jose Antonio Lara
Published: (2026)
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective
by: Wen, Kaiyue, et al.
Published: (2024)
by: Wen, Kaiyue, et al.
Published: (2024)
Curvature in the Looking-Glass: Optimal Methods to Exploit Curvature of Expectation in the Loss Landscape
by: Duersch, Jed A., et al.
Published: (2024)
by: Duersch, Jed A., et al.
Published: (2024)
LossLens: Diagnostics for Machine Learning through Loss Landscape Visual Analytics
by: Xie, Tiankai, et al.
Published: (2024)
by: Xie, Tiankai, et al.
Published: (2024)
Adaptive Refinement Protocols for Distributed Distribution Estimation under $\ell^p$-Losses
by: Yuan, Deheng, et al.
Published: (2024)
by: Yuan, Deheng, et al.
Published: (2024)
Landscaper: Understanding Loss Landscapes Through Multi-Dimensional Topological Analysis
by: Chen, Jiaqing, et al.
Published: (2026)
by: Chen, Jiaqing, et al.
Published: (2026)
Sensitivity Analysis On Loss Landscape
by: Faroz, Salman
Published: (2024)
by: Faroz, Salman
Published: (2024)
OLion: Approaching the Hadamard Ideal by Intersecting Spectral and $\ell_{\infty}$ Implicit Biases
by: Wang, Zixiao, et al.
Published: (2026)
by: Wang, Zixiao, et al.
Published: (2026)
A Tale of Two Symmetries: Exploring the Loss Landscape of Equivariant Models
by: Xie, YuQing, et al.
Published: (2025)
by: Xie, YuQing, et al.
Published: (2025)
Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling
by: Wang, Fei, et al.
Published: (2026)
by: Wang, Fei, et al.
Published: (2026)
Estimating Higher-Order Mixed Memberships via the $\ell_{2,\infty}$ Tensor Perturbation Bound
by: Agterberg, Joshua, et al.
Published: (2022)
by: Agterberg, Joshua, et al.
Published: (2022)
Visualizing Loss Functions as Topological Landscape Profiles
by: Geniesse, Caleb, et al.
Published: (2024)
by: Geniesse, Caleb, et al.
Published: (2024)
CP Loss: Channel-wise Perceptual Loss for Time Series Forecasting
by: Zha, Yaohua, et al.
Published: (2026)
by: Zha, Yaohua, et al.
Published: (2026)
Provable Benefit of Sign Descent: A Minimal Model Under Heavy-Tailed Class Imbalance
by: Yadav, Robin, et al.
Published: (2025)
by: Yadav, Robin, et al.
Published: (2025)
On the $O(\frac{\sqrt{d}}{K^{1/4}})$ Convergence Rate of AdamW Measured by $\ell_1$ Norm
by: Li, Huan, et al.
Published: (2025)
by: Li, Huan, et al.
Published: (2025)
Evaluating Loss Landscapes from a Topology Perspective
by: Xie, Tiankai, et al.
Published: (2024)
by: Xie, Tiankai, et al.
Published: (2024)
Exploiting Preferences in Loss Functions for Sequential Recommendation via Weak Transitivity
by: Chung, Hyunsoo, et al.
Published: (2024)
by: Chung, Hyunsoo, et al.
Published: (2024)
Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks
by: Xu, Yichu, et al.
Published: (2024)
by: Xu, Yichu, et al.
Published: (2024)
There is a Singularity in the Loss Landscape
by: Lowell, Mark
Published: (2022)
by: Lowell, Mark
Published: (2022)
On the Hyperparameter Loss Landscapes of Machine Learning Models: An Exploratory Study
by: Huang, Mingyu, et al.
Published: (2023)
by: Huang, Mingyu, et al.
Published: (2023)
GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance
by: Kim, Jinuk, et al.
Published: (2025)
by: Kim, Jinuk, et al.
Published: (2025)
On the Computational Landscape of Replicable Learning
by: Kalavasis, Alkis, et al.
Published: (2024)
by: Kalavasis, Alkis, et al.
Published: (2024)
Early-Warning Signals of Grokking via Loss-Landscape Geometry
by: Xu, Yongzhong
Published: (2026)
by: Xu, Yongzhong
Published: (2026)
Near-Linear Time Projection onto the $\ell_{1,\infty}$ Ball; Application to Sparse Autoencoders
by: Perez, Guillaume, et al.
Published: (2023)
by: Perez, Guillaume, et al.
Published: (2023)
Anon: Extrapolating Adaptivity Beyond SGD and Adam
by: Zhang, Yiheng, et al.
Published: (2026)
by: Zhang, Yiheng, et al.
Published: (2026)
Adapprox: Adaptive Approximation in Adam Optimization via Randomized Low-Rank Matrices
by: Zhao, Pengxiang, et al.
Published: (2024)
by: Zhao, Pengxiang, et al.
Published: (2024)
Adaptively Coordinating with Novel Partners via Learned Latent Strategies
by: Li, Benjamin, et al.
Published: (2025)
by: Li, Benjamin, et al.
Published: (2025)
Stable Coresets via Posterior Sampling: Aligning Induced and Full Loss Landscapes
by: Chang, Wei-Kai, et al.
Published: (2025)
by: Chang, Wei-Kai, et al.
Published: (2025)
Using Degeneracy in the Loss Landscape for Mechanistic Interpretability
by: Bushnaq, Lucius, et al.
Published: (2024)
by: Bushnaq, Lucius, et al.
Published: (2024)
Similar Items
-
Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization
by: Xie, Shuo, et al.
Published: (2024) -
Honesty over Accuracy: Trustworthy Language Models through Reinforced Hesitation
by: Mohamadi, Mohamad Amin, et al.
Published: (2025) -
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition
by: Mohamadi, Mohamad Amin, et al.
Published: (2024) -
Adaptive Preconditioners Trigger Loss Spikes in Adam
by: Bai, Zhiwei, et al.
Published: (2025) -
A Tale of Two Geometries: Adaptive Optimizers and Non-Euclidean Descent
by: Xie, Shuo, et al.
Published: (2025)