Saved in:
Bibliographic Details
Main Authors: Jansen, David, Rausch, Roman, Montero, David, Orus, Roman
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.00161
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912863774310400
author Jansen, David
Rausch, Roman
Montero, David
Orus, Roman
author_facet Jansen, David
Rausch, Roman
Montero, David
Orus, Roman
contents Compressing resource-intensive large language models by removing whole transformer blocks is a seemingly simple idea, but identifying which blocks to remove constitutes an exponentially difficult combinatorial problem. In this paper, we formulate block removal as a constrained binary optimization problem that can be mapped to a physical system (Ising model), whose energies are a strong proxy for downstream model performance. This formulation enables an efficient ranking of a large number of candidate block-removal configurations and yields many high-quality, non-trivial solutions beyond consecutive regions. We demonstrate that our approach outperforms state-of-the-art block-removal methods across several benchmarks, with performance gains persisting after short retraining, and reaching improvements of up to 6 points on the MMLU benchmark. Our method requires only forward and backward passes for a few active parameters, together with an (at least approximate) Ising solver, and can be readily applied to any architecture. We illustrate this generality on the recent NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 model, which exhibits a highly inhomogeneous and challenging block structure.
format Preprint
id arxiv_https___arxiv_org_abs_2602_00161
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Block removal for large language models through constrained binary optimization
Jansen, David
Rausch, Roman
Montero, David
Orus, Roman
Machine Learning
Artificial Intelligence
Computation and Language
Quantum Physics
Compressing resource-intensive large language models by removing whole transformer blocks is a seemingly simple idea, but identifying which blocks to remove constitutes an exponentially difficult combinatorial problem. In this paper, we formulate block removal as a constrained binary optimization problem that can be mapped to a physical system (Ising model), whose energies are a strong proxy for downstream model performance. This formulation enables an efficient ranking of a large number of candidate block-removal configurations and yields many high-quality, non-trivial solutions beyond consecutive regions. We demonstrate that our approach outperforms state-of-the-art block-removal methods across several benchmarks, with performance gains persisting after short retraining, and reaching improvements of up to 6 points on the MMLU benchmark. Our method requires only forward and backward passes for a few active parameters, together with an (at least approximate) Ising solver, and can be readily applied to any architecture. We illustrate this generality on the recent NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 model, which exhibits a highly inhomogeneous and challenging block structure.
title Block removal for large language models through constrained binary optimization
topic Machine Learning
Artificial Intelligence
Computation and Language
Quantum Physics
url https://arxiv.org/abs/2602.00161