Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Khilar, Snigdha Chandan
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Differential Geometry
Online Access:	https://arxiv.org/abs/2605.30836
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917547254743040
author	Khilar, Snigdha Chandan
author_facet	Khilar, Snigdha Chandan
contents	Recent SVD based compression methods for large language models like SVD LLM and Basis Sharing can be unified under one optimization problem. While mathematical proofs and tests on Pythia models show this unified approach improves weight reconstruction error by up to 46% percent it fails in practical tasks. Downstream metrics like perplexity and accuracy severely degrade compared to standard per layer SVD LLM. The authors explain this failure mechanistically. Although the bundle method mathematically couples adjacent layers the transformer residual stream actually decouples them during forward passes. Thus per layer optimality matters more than joint cross layer optimization. The paper concludes that weight space reconstruction is a flawed objective for cross layer compression and future methods must focus on per layer activation reconstruction instead.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_30836
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Cross-Layer Subspace Coupling for LLM Compression: A Unifying Framework and Its Empirical Limits Khilar, Snigdha Chandan Machine Learning Differential Geometry Recent SVD based compression methods for large language models like SVD LLM and Basis Sharing can be unified under one optimization problem. While mathematical proofs and tests on Pythia models show this unified approach improves weight reconstruction error by up to 46% percent it fails in practical tasks. Downstream metrics like perplexity and accuracy severely degrade compared to standard per layer SVD LLM. The authors explain this failure mechanistically. Although the bundle method mathematically couples adjacent layers the transformer residual stream actually decouples them during forward passes. Thus per layer optimality matters more than joint cross layer optimization. The paper concludes that weight space reconstruction is a flawed objective for cross layer compression and future methods must focus on per layer activation reconstruction instead.
title	Cross-Layer Subspace Coupling for LLM Compression: A Unifying Framework and Its Empirical Limits
topic	Machine Learning Differential Geometry
url	https://arxiv.org/abs/2605.30836

Similar Items