Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Büchel, Julian, Chalas, Iason, Acampa, Giovanni, Chen, An, Fagbohungbe, Omobayode, Tsai, Sidney, Maghraoui, Kaoutar El, Gallo, Manuel Le, Rahimi, Abbas, Sebastian, Abu
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Machine Learning
Online-Zugang:	https://arxiv.org/abs/2505.09663
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866917043927777280
author	Büchel, Julian Chalas, Iason Acampa, Giovanni Chen, An Fagbohungbe, Omobayode Tsai, Sidney Maghraoui, Kaoutar El Gallo, Manuel Le Rahimi, Abbas Sebastian, Abu
author_facet	Büchel, Julian Chalas, Iason Acampa, Giovanni Chen, An Fagbohungbe, Omobayode Tsai, Sidney Maghraoui, Kaoutar El Gallo, Manuel Le Rahimi, Abbas Sebastian, Abu
contents	Analog in-memory computing (AIMC) is a promising compute paradigm to improve speed and power efficiency of neural network inference beyond the limits of conventional von Neumann-based architectures. However, AIMC introduces fundamental challenges such as noisy computations and strict constraints on input and output quantization. Because of these constraints and imprecisions, off-the-shelf LLMs are not able to achieve 4-bit-level performance when deployed on AIMC-based hardware. While researchers previously investigated recovering this accuracy gap on small, mostly vision-based models, a generic method applicable to LLMs pre-trained on trillions of tokens does not yet exist. In this work, we introduce a general and scalable method to robustly adapt LLMs for execution on noisy, low-precision analog hardware. Our approach enables state-of-the-art models $\unicode{x2013}$ including Phi-3-mini-4k-instruct and Llama-3.2-1B-Instruct $\unicode{x2013}$ to retain performance comparable to 4-bit weight, 8-bit activation baselines, despite the presence of analog noise and quantization constraints. Additionally, we show that as a byproduct of our training methodology, analog foundation models can be quantized for inference on low-precision digital hardware. Finally, we show that our models also benefit from test-time compute scaling, showing better scaling behavior than models trained with 4-bit weight and 8-bit static input quantization. Our work bridges the gap between high-capacity LLMs and efficient analog hardware, offering a path toward energy-efficient foundation models. Code is available at https://github.com/IBM/analog-foundation-models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_09663
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Analog Foundation Models Büchel, Julian Chalas, Iason Acampa, Giovanni Chen, An Fagbohungbe, Omobayode Tsai, Sidney Maghraoui, Kaoutar El Gallo, Manuel Le Rahimi, Abbas Sebastian, Abu Machine Learning Analog in-memory computing (AIMC) is a promising compute paradigm to improve speed and power efficiency of neural network inference beyond the limits of conventional von Neumann-based architectures. However, AIMC introduces fundamental challenges such as noisy computations and strict constraints on input and output quantization. Because of these constraints and imprecisions, off-the-shelf LLMs are not able to achieve 4-bit-level performance when deployed on AIMC-based hardware. While researchers previously investigated recovering this accuracy gap on small, mostly vision-based models, a generic method applicable to LLMs pre-trained on trillions of tokens does not yet exist. In this work, we introduce a general and scalable method to robustly adapt LLMs for execution on noisy, low-precision analog hardware. Our approach enables state-of-the-art models $\unicode{x2013}$ including Phi-3-mini-4k-instruct and Llama-3.2-1B-Instruct $\unicode{x2013}$ to retain performance comparable to 4-bit weight, 8-bit activation baselines, despite the presence of analog noise and quantization constraints. Additionally, we show that as a byproduct of our training methodology, analog foundation models can be quantized for inference on low-precision digital hardware. Finally, we show that our models also benefit from test-time compute scaling, showing better scaling behavior than models trained with 4-bit weight and 8-bit static input quantization. Our work bridges the gap between high-capacity LLMs and efficient analog hardware, offering a path toward energy-efficient foundation models. Code is available at https://github.com/IBM/analog-foundation-models.
title	Analog Foundation Models
topic	Machine Learning
url	https://arxiv.org/abs/2505.09663

Ähnliche Einträge