:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zeno, Chen, Ongie, Greg, Blumenfeld, Yaniv, Weinberger, Nir, Soudry, Daniel
Format:	Preprint
Published:	2023
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2311.06748
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

When Diffusion Models Memorize: Inductive Biases in Probability Flow of Minimum-Norm Shallow Neural Nets
by: Zeno, Chen, et al.
Published: (2025)

Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators
by: Blumenfeld, Yaniv, et al.
Published: (2024)

The Joint Effect of Task Similarity and Overparameterization on Catastrophic Forgetting -- An Analytical Model
by: Goldfarb, Daniel, et al.
Published: (2024)

Depth Separation in Norm-Bounded Infinite-Width Neural Networks
by: Parkinson, Suzanna, et al.
Published: (2024)

ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models
by: Parkinson, Suzanna, et al.
Published: (2023)

Quantum Maximum Likelihood Prediction via Hilbert Space Embeddings
by: Sreekumar, Sreejith, et al.
Published: (2026)

Minimum Variance Unbiased N:M Sparsity for the Neural Gradients
by: Chmiel, Brian, et al.
Published: (2022)

A representation-learning game for classes of prediction tasks
by: Uzan, Neria, et al.
Published: (2024)

Exploration-Exploitation Tradeoff in Universal Lossy Compression
by: Weinberger, Nir, et al.
Published: (2025)

PLUMAGE: Probabilistic Low rank Unbiased Min Variance Gradient Estimator for Efficient Large Model Training
by: Haroush, Matan, et al.
Published: (2025)

Statistical curriculum learning: An elimination algorithm achieving an oracle risk
by: Cohen, Omer, et al.
Published: (2024)

Workspace Optimization: How to Train Your Agent
by: Sarafian, Elad, et al.
Published: (2026)

Minimum-Norm Interpolation Under Covariate Shift
by: Mallinar, Neil, et al.
Published: (2024)

Characterization of the Distortion-Perception Tradeoff for Finite Channels with Arbitrary Metrics
by: Freirich, Dror, et al.
Published: (2024)

On Bits and Bandits: Quantifying the Regret-Information Trade-off
by: Shufaro, Itai, et al.
Published: (2024)

HO-FMN: Hyperparameter Optimization for Fast Minimum-Norm Attacks
by: Mura, Raffaele, et al.
Published: (2024)

Explore to Generalize in Zero-Shot RL
by: Zisselman, Ev, et al.
Published: (2023)

The Implicit Bias of Gradient Descent on Separable Multiclass Data
by: Ravi, Hrithik, et al.
Published: (2024)

Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks
by: Kinderman, Edan, et al.
Published: (2024)

How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers
by: Buzaglo, Gon, et al.
Published: (2024)

How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation
by: Feldman, Shai, et al.
Published: (2026)

FP4 All the Way: Fully Quantized Training of LLMs
by: Chmiel, Brian, et al.
Published: (2025)

Scaling FP8 training to trillion-token LLMs
by: Fishman, Maxim, et al.
Published: (2024)

Tensor-Parallelism with Partially Synchronized Activations
by: Lamprecht, Itay, et al.
Published: (2025)

Temperature is All You Need for Generalization in Langevin Dynamics and other Markov Processes
by: Harel, Itamar, et al.
Published: (2025)

Minimum Norm Interpolation via The Local Theory of Banach Spaces: The Role of $2$-Uniform Convexity
by: Kur, Gil, et al.
Published: (2026)

Maximal-Capacity Discrete Memoryless Channel Identification
by: Egger, Maximilian, et al.
Published: (2024)

Semi-Supervised Hypothesis Testing by Betting on Predictions
by: Tenzer, Yaniv, et al.
Published: (2026)

Valid Best-Model Identification for LLM Evaluation via Low-Rank Factorization
by: Tolochinsky, Elad, et al.
Published: (2026)

Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats
by: Chmiel, Brian, et al.
Published: (2021)

The Implicit Bias of Gradient Descent on Separable Data
by: Soudry, Daniel, et al.
Published: (2017)

Sufficient Conditions for Stability of Minimum-Norm Interpolating Deep ReLU Networks
by: Harzli, Ouns El, et al.
Published: (2026)

Exponential Quantum Communication Advantage in Distributed Inference and Learning
by: Gilboa, Dar, et al.
Published: (2023)

Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space
by: Bebchuk, Alon, et al.
Published: (2026)

Functional Mean Flow in Hilbert Space
by: Li, Zhiqi, et al.
Published: (2025)

Normalized Architectures are Natively 4-Bit
by: Fishman, Maxim, et al.
Published: (2026)

Approximation Rates of Shallow Neural Networks: Barron Spaces, Activation Functions and Optimality Analysis
by: Lu, Jian, et al.
Published: (2025)

Bayesian Modeling and Estimation of Linear Time-Varying Systems using Neural Networks and Gaussian Processes
by: Shulman, Yaniv
Published: (2025)

Are Greedy Task Orderings Better Than Random in Continual Linear Regression?
by: Tsipory, Matan, et al.
Published: (2025)

Optimal L2 Regularization in High-dimensional Continual Linear Regression
by: Karpel, Gilad, et al.
Published: (2026)