Saved in:
Bibliographic Details
Main Authors: Tian, Zhen, Zhao, Wayne Xin, Wen, Ji-Rong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2501.12896
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929684116144128
author Tian, Zhen
Zhao, Wayne Xin
Wen, Ji-Rong
author_facet Tian, Zhen
Zhao, Wayne Xin
Wen, Ji-Rong
contents In this paper, we propose a novel optimizer state compression algorithm, namely $π$-Quant, which leverages the properties of irrational numbers (e.g., $π$) for memory-efficient training. The core idea is based on our mathematical findings, which show that a pair of parameters can be represented by a single rotation angle using the complex rotation scheme. Building on this insight, we map the parameters into a complex space and perform quantization using the corresponding rotation angles. To efficiently integrate it into optimization process, we develop an efficient system of geometric equations that computes the precise rotation angles with linear complexity. We evaluate $π$-Quant on a wide range of tasks. Our experiments show that it can reduce the bit-width of parameters to 3.32-bit, achieving a 75% reduction in parameter scale and a 40% decrease in GPU memory usage, all while maintaining full accuracy.
format Preprint
id arxiv_https___arxiv_org_abs_2501_12896
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Irrational Complex Rotations Empower Low-bit Optimizers
Tian, Zhen
Zhao, Wayne Xin
Wen, Ji-Rong
Machine Learning
In this paper, we propose a novel optimizer state compression algorithm, namely $π$-Quant, which leverages the properties of irrational numbers (e.g., $π$) for memory-efficient training. The core idea is based on our mathematical findings, which show that a pair of parameters can be represented by a single rotation angle using the complex rotation scheme. Building on this insight, we map the parameters into a complex space and perform quantization using the corresponding rotation angles. To efficiently integrate it into optimization process, we develop an efficient system of geometric equations that computes the precise rotation angles with linear complexity. We evaluate $π$-Quant on a wide range of tasks. Our experiments show that it can reduce the bit-width of parameters to 3.32-bit, achieving a 75% reduction in parameter scale and a 40% decrease in GPU memory usage, all while maintaining full accuracy.
title Irrational Complex Rotations Empower Low-bit Optimizers
topic Machine Learning
url https://arxiv.org/abs/2501.12896