Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Saha, Pritish, Rajbangshi, Chandrav, Goyal, Rudra, Goyal, Mohit, Deo, Anurag, Roy, Biswajit, Singh, Ningthoujam Dhanachandra, Goswami, Raxit, Das, Amitava
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2601.00231
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917179859927040
author	Saha, Pritish Rajbangshi, Chandrav Goyal, Rudra Goyal, Mohit Deo, Anurag Roy, Biswajit Singh, Ningthoujam Dhanachandra Goswami, Raxit Das, Amitava
author_facet	Saha, Pritish Rajbangshi, Chandrav Goyal, Rudra Goyal, Mohit Deo, Anurag Roy, Biswajit Singh, Ningthoujam Dhanachandra Goswami, Raxit Das, Amitava
contents	Parameter-efficient fine-tuning (PEFT) is the default way to adapt LLMs, but widely used LoRA and QLoRA are largely geometry-agnostic: they optimize in fixed, randomly oriented low-rank subspaces with first-order descent, mostly ignoring local loss curvature. This can inflate the effective update budget and amplify drift along weakly constrained directions. We introduce GRIT, a dynamic, curvature-aware LoRA procedure that preserves the LoRA parameterization but: (1) preconditions gradients in rank space using K-FAC as a natural-gradient proxy; (2) periodically reprojects the low-rank basis onto dominant Fisher eigendirections to suppress drift; and (3) adapts the effective rank from the spectrum so capacity concentrates where signal resides. Across instruction-following, comprehension, and reasoning benchmarks on LLaMA backbones, GRIT matches or surpasses LoRA and QLoRA while reducing trainable parameters by 46% on average (25--80% across tasks), without practical quality loss across prompt styles and data mixes. To model forgetting, we fit a curvature-modulated power law. Empirically, GRIT yields lower drift and a better updates-vs-retention frontier than strong PEFT-optimizer baselines (Orthogonal-LoRA, IA3, DoRA, Eff-FT, Shampoo).
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_00231
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	GRIT -- Geometry-Aware PEFT with K-FACPreconditioning, Fisher-Guided Reprojection, andDynamic Rank Adaptation Saha, Pritish Rajbangshi, Chandrav Goyal, Rudra Goyal, Mohit Deo, Anurag Roy, Biswajit Singh, Ningthoujam Dhanachandra Goswami, Raxit Das, Amitava Machine Learning Artificial Intelligence Parameter-efficient fine-tuning (PEFT) is the default way to adapt LLMs, but widely used LoRA and QLoRA are largely geometry-agnostic: they optimize in fixed, randomly oriented low-rank subspaces with first-order descent, mostly ignoring local loss curvature. This can inflate the effective update budget and amplify drift along weakly constrained directions. We introduce GRIT, a dynamic, curvature-aware LoRA procedure that preserves the LoRA parameterization but: (1) preconditions gradients in rank space using K-FAC as a natural-gradient proxy; (2) periodically reprojects the low-rank basis onto dominant Fisher eigendirections to suppress drift; and (3) adapts the effective rank from the spectrum so capacity concentrates where signal resides. Across instruction-following, comprehension, and reasoning benchmarks on LLaMA backbones, GRIT matches or surpasses LoRA and QLoRA while reducing trainable parameters by 46% on average (25--80% across tasks), without practical quality loss across prompt styles and data mixes. To model forgetting, we fit a curvature-modulated power law. Empirically, GRIT yields lower drift and a better updates-vs-retention frontier than strong PEFT-optimizer baselines (Orthogonal-LoRA, IA3, DoRA, Eff-FT, Shampoo).
title	GRIT -- Geometry-Aware PEFT with K-FACPreconditioning, Fisher-Guided Reprojection, andDynamic Rank Adaptation
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2601.00231

Similar Items