Saved in:
| Main Authors: | Zeno, Chen, Ongie, Greg, Blumenfeld, Yaniv, Weinberger, Nir, Soudry, Daniel |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2311.06748 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets
by: Zeno, Chen, et al.
Published: (2025)
by: Zeno, Chen, et al.
Published: (2025)
Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators
by: Blumenfeld, Yaniv, et al.
Published: (2024)
by: Blumenfeld, Yaniv, et al.
Published: (2024)
The Joint Effect of Task Similarity and Overparameterization on Catastrophic Forgetting -- An Analytical Model
by: Goldfarb, Daniel, et al.
Published: (2024)
by: Goldfarb, Daniel, et al.
Published: (2024)
Depth Separation in Norm-Bounded Infinite-Width Neural Networks
by: Parkinson, Suzanna, et al.
Published: (2024)
by: Parkinson, Suzanna, et al.
Published: (2024)
ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models
by: Parkinson, Suzanna, et al.
Published: (2023)
by: Parkinson, Suzanna, et al.
Published: (2023)
Quantum Maximum Likelihood Prediction via Hilbert Space Embeddings
by: Sreekumar, Sreejith, et al.
Published: (2026)
by: Sreekumar, Sreejith, et al.
Published: (2026)
Minimum Variance Unbiased N:M Sparsity for the Neural Gradients
by: Chmiel, Brian, et al.
Published: (2022)
by: Chmiel, Brian, et al.
Published: (2022)
A representation-learning game for classes of prediction tasks
by: Uzan, Neria, et al.
Published: (2024)
by: Uzan, Neria, et al.
Published: (2024)
Exploration-Exploitation Tradeoff in Universal Lossy Compression
by: Weinberger, Nir, et al.
Published: (2025)
by: Weinberger, Nir, et al.
Published: (2025)
PLUMAGE: Probabilistic Low rank Unbiased Min Variance Gradient Estimator for Efficient Large Model Training
by: Haroush, Matan, et al.
Published: (2025)
by: Haroush, Matan, et al.
Published: (2025)
Statistical curriculum learning: An elimination algorithm achieving an oracle risk
by: Cohen, Omer, et al.
Published: (2024)
by: Cohen, Omer, et al.
Published: (2024)
Workspace Optimization: How to Train Your Agent
by: Sarafian, Elad, et al.
Published: (2026)
by: Sarafian, Elad, et al.
Published: (2026)
Minimum-Norm Interpolation Under Covariate Shift
by: Mallinar, Neil, et al.
Published: (2024)
by: Mallinar, Neil, et al.
Published: (2024)
Characterization of the Distortion-Perception Tradeoff for Finite Channels with Arbitrary Metrics
by: Freirich, Dror, et al.
Published: (2024)
by: Freirich, Dror, et al.
Published: (2024)
On Bits and Bandits: Quantifying the Regret-Information Trade-off
by: Shufaro, Itai, et al.
Published: (2024)
by: Shufaro, Itai, et al.
Published: (2024)
HO-FMN: Hyperparameter Optimization for Fast Minimum-Norm Attacks
by: Mura, Raffaele, et al.
Published: (2024)
by: Mura, Raffaele, et al.
Published: (2024)
Explore to Generalize in Zero-Shot RL
by: Zisselman, Ev, et al.
Published: (2023)
by: Zisselman, Ev, et al.
Published: (2023)
The Implicit Bias of Gradient Descent on Separable Multiclass Data
by: Ravi, Hrithik, et al.
Published: (2024)
by: Ravi, Hrithik, et al.
Published: (2024)
Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks
by: Kinderman, Edan, et al.
Published: (2024)
by: Kinderman, Edan, et al.
Published: (2024)
How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers
by: Buzaglo, Gon, et al.
Published: (2024)
by: Buzaglo, Gon, et al.
Published: (2024)
How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation
by: Feldman, Shai, et al.
Published: (2026)
by: Feldman, Shai, et al.
Published: (2026)
FP4 All the Way: Fully Quantized Training of LLMs
by: Chmiel, Brian, et al.
Published: (2025)
by: Chmiel, Brian, et al.
Published: (2025)
Scaling FP8 training to trillion-token LLMs
by: Fishman, Maxim, et al.
Published: (2024)
by: Fishman, Maxim, et al.
Published: (2024)
Tensor-Parallelism with Partially Synchronized Activations
by: Lamprecht, Itay, et al.
Published: (2025)
by: Lamprecht, Itay, et al.
Published: (2025)
Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes
by: Harel, Itamar, et al.
Published: (2025)
by: Harel, Itamar, et al.
Published: (2025)
Minimum Norm Interpolation via The Local Theory of Banach Spaces: The Role of $2$-Uniform Convexity
by: Kur, Gil, et al.
Published: (2026)
by: Kur, Gil, et al.
Published: (2026)
Maximal-Capacity Discrete Memoryless Channel Identification
by: Egger, Maximilian, et al.
Published: (2024)
by: Egger, Maximilian, et al.
Published: (2024)
Semi-Supervised Hypothesis Testing by Betting on Predictions
by: Tenzer, Yaniv, et al.
Published: (2026)
by: Tenzer, Yaniv, et al.
Published: (2026)
Valid Best-Model Identification for LLM Evaluation via Low-Rank Factorization
by: Tolochinsky, Elad, et al.
Published: (2026)
by: Tolochinsky, Elad, et al.
Published: (2026)
Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats
by: Chmiel, Brian, et al.
Published: (2021)
by: Chmiel, Brian, et al.
Published: (2021)
The Implicit Bias of Gradient Descent on Separable Data
by: Soudry, Daniel, et al.
Published: (2017)
by: Soudry, Daniel, et al.
Published: (2017)
Sufficient Conditions for Stability of Minimum-Norm Interpolating Deep ReLU Networks
by: Harzli, Ouns El, et al.
Published: (2026)
by: Harzli, Ouns El, et al.
Published: (2026)
Exponential Quantum Communication Advantage in Distributed Inference and Learning
by: Gilboa, Dar, et al.
Published: (2023)
by: Gilboa, Dar, et al.
Published: (2023)
Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space
by: Bebchuk, Alon, et al.
Published: (2026)
by: Bebchuk, Alon, et al.
Published: (2026)
Functional Mean Flow in Hilbert Space
by: Li, Zhiqi, et al.
Published: (2025)
by: Li, Zhiqi, et al.
Published: (2025)
Normalized Architectures are Natively 4-Bit
by: Fishman, Maxim, et al.
Published: (2026)
by: Fishman, Maxim, et al.
Published: (2026)
Approximation Rates of Shallow Neural Networks: Barron Spaces, Activation Functions and Optimality Analysis
by: Lu, Jian, et al.
Published: (2025)
by: Lu, Jian, et al.
Published: (2025)
Bayesian Modeling and Estimation of Linear Time-Varying Systems using Neural Networks and Gaussian Processes
by: Shulman, Yaniv
Published: (2025)
by: Shulman, Yaniv
Published: (2025)
Are Greedy Task Orderings Better Than Random in Continual Linear Regression?
by: Tsipory, Matan, et al.
Published: (2025)
by: Tsipory, Matan, et al.
Published: (2025)
Optimal L2 Regularization in High-dimensional Continual Linear Regression
by: Karpel, Gilad, et al.
Published: (2026)
by: Karpel, Gilad, et al.
Published: (2026)
Similar Items
-
When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets
by: Zeno, Chen, et al.
Published: (2025) -
Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators
by: Blumenfeld, Yaniv, et al.
Published: (2024) -
The Joint Effect of Task Similarity and Overparameterization on Catastrophic Forgetting -- An Analytical Model
by: Goldfarb, Daniel, et al.
Published: (2024) -
Depth Separation in Norm-Bounded Infinite-Width Neural Networks
by: Parkinson, Suzanna, et al.
Published: (2024) -
ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models
by: Parkinson, Suzanna, et al.
Published: (2023)