Gespeichert in:
| Hauptverfasser: | , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Veröffentlicht: |
2025
|
| Schlagworte: | |
| Online-Zugang: | https://arxiv.org/abs/2505.09663 |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| _version_ | 1866917043927777280 |
|---|---|
| author | Büchel, Julian Chalas, Iason Acampa, Giovanni Chen, An Fagbohungbe, Omobayode Tsai, Sidney Maghraoui, Kaoutar El Gallo, Manuel Le Rahimi, Abbas Sebastian, Abu |
| author_facet | Büchel, Julian Chalas, Iason Acampa, Giovanni Chen, An Fagbohungbe, Omobayode Tsai, Sidney Maghraoui, Kaoutar El Gallo, Manuel Le Rahimi, Abbas Sebastian, Abu |
| contents | Analog in-memory computing (AIMC) is a promising compute paradigm to improve speed and power efficiency of neural network inference beyond the limits of conventional von Neumann-based architectures. However, AIMC introduces fundamental challenges such as noisy computations and strict constraints on input and output quantization. Because of these constraints and imprecisions, off-the-shelf LLMs are not able to achieve 4-bit-level performance when deployed on AIMC-based hardware. While researchers previously investigated recovering this accuracy gap on small, mostly vision-based models, a generic method applicable to LLMs pre-trained on trillions of tokens does not yet exist. In this work, we introduce a general and scalable method to robustly adapt LLMs for execution on noisy, low-precision analog hardware. Our approach enables state-of-the-art models $\unicode{x2013}$ including Phi-3-mini-4k-instruct and Llama-3.2-1B-Instruct $\unicode{x2013}$ to retain performance comparable to 4-bit weight, 8-bit activation baselines, despite the presence of analog noise and quantization constraints. Additionally, we show that as a byproduct of our training methodology, analog foundation models can be quantized for inference on low-precision digital hardware. Finally, we show that our models also benefit from test-time compute scaling, showing better scaling behavior than models trained with 4-bit weight and 8-bit static input quantization. Our work bridges the gap between high-capacity LLMs and efficient analog hardware, offering a path toward energy-efficient foundation models. Code is available at https://github.com/IBM/analog-foundation-models. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2505_09663 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Analog Foundation Models Büchel, Julian Chalas, Iason Acampa, Giovanni Chen, An Fagbohungbe, Omobayode Tsai, Sidney Maghraoui, Kaoutar El Gallo, Manuel Le Rahimi, Abbas Sebastian, Abu Machine Learning Analog in-memory computing (AIMC) is a promising compute paradigm to improve speed and power efficiency of neural network inference beyond the limits of conventional von Neumann-based architectures. However, AIMC introduces fundamental challenges such as noisy computations and strict constraints on input and output quantization. Because of these constraints and imprecisions, off-the-shelf LLMs are not able to achieve 4-bit-level performance when deployed on AIMC-based hardware. While researchers previously investigated recovering this accuracy gap on small, mostly vision-based models, a generic method applicable to LLMs pre-trained on trillions of tokens does not yet exist. In this work, we introduce a general and scalable method to robustly adapt LLMs for execution on noisy, low-precision analog hardware. Our approach enables state-of-the-art models $\unicode{x2013}$ including Phi-3-mini-4k-instruct and Llama-3.2-1B-Instruct $\unicode{x2013}$ to retain performance comparable to 4-bit weight, 8-bit activation baselines, despite the presence of analog noise and quantization constraints. Additionally, we show that as a byproduct of our training methodology, analog foundation models can be quantized for inference on low-precision digital hardware. Finally, we show that our models also benefit from test-time compute scaling, showing better scaling behavior than models trained with 4-bit weight and 8-bit static input quantization. Our work bridges the gap between high-capacity LLMs and efficient analog hardware, offering a path toward energy-efficient foundation models. Code is available at https://github.com/IBM/analog-foundation-models. |
| title | Analog Foundation Models |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2505.09663 |