Saved in:
Bibliographic Details
Main Authors: Fan, Raymond, Sandlund, Bryce, Ko, Lin Myat
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.04808
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • The volume hypothesis suggests deep learning is effective because it is likely to find flat minima due to their large volumes, and flat minima generalize well. This picture does not explain the role of large datasets in generalization. Measuring minima volumes under varying amounts of training data reveals sharp minima which generalize well exist, but are unlikely to be found due to their small volumes. Increasing data changes the loss landscape, such that previously small generalizing minima become (relatively) large.