Saved in:
Bibliographic Details
Main Author: Schioppa, Andrea
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.03994
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910664543436800
author Schioppa, Andrea
author_facet Schioppa, Andrea
contents The study of modern machine learning models often necessitates storing vast quantities of gradients or Hessian vector products (HVPs). Traditional sketching methods struggle to scale under these memory constraints. We present a novel framework for scalable gradient and HVP sketching, tailored for modern hardware. We provide theoretical guarantees and demonstrate the power of our methods in applications like training data attribution, Hessian spectrum analysis, and intrinsic dimension computation for pre-trained language models. Our work sheds new light on the behavior of pre-trained language models, challenging assumptions about their intrinsic dimensionality and Hessian properties.
format Preprint
id arxiv_https___arxiv_org_abs_2402_03994
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Efficient Sketches for Training Data Attribution and Studying the Loss Landscape
Schioppa, Andrea
Machine Learning
The study of modern machine learning models often necessitates storing vast quantities of gradients or Hessian vector products (HVPs). Traditional sketching methods struggle to scale under these memory constraints. We present a novel framework for scalable gradient and HVP sketching, tailored for modern hardware. We provide theoretical guarantees and demonstrate the power of our methods in applications like training data attribution, Hessian spectrum analysis, and intrinsic dimension computation for pre-trained language models. Our work sheds new light on the behavior of pre-trained language models, challenging assumptions about their intrinsic dimensionality and Hessian properties.
title Efficient Sketches for Training Data Attribution and Studying the Loss Landscape
topic Machine Learning
url https://arxiv.org/abs/2402.03994