Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Guo, Dongxin, Wu, Jikun, Yiu, Siu Ming
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence 68T07, 62C12 I.2.6; F.2.2
Online Access:	https://arxiv.org/abs/2604.15764
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910138963591168
author	Guo, Dongxin Wu, Jikun Yiu, Siu Ming
author_facet	Guo, Dongxin Wu, Jikun Yiu, Siu Ming
contents	Early-exit neural networks enable adaptive computation by allowing confident predictions to exit at intermediate layers, achieving 2-8$\times$ inference speedup. Despite widespread deployment, their generalization properties lack theoretical understanding -- a gap explicitly identified in recent surveys. This paper establishes a unified PAC-Bayesian framework for adaptive-depth networks. (1) Novel Entropy-Based Bounds: We prove the first generalization bounds depending on exit-depth entropy $H(D)$ and expected depth $\mathbb{E}[D]$ rather than maximum depth $K$, with sample complexity $\mathcal{O}((\mathbb{E}[D] \cdot d + H(D))/ε^2)$. (2) Explicit Constructive Constants: Our analysis yields the leading coefficient $\sqrt{2\ln 2} \approx 1.177$ with complete derivation. (3) Provable Early-Exit Advantages: We establish sufficient conditions under which adaptive-depth networks strictly outperform fixed-depth counterparts. (4) Extension to Approximate Label Independence: We relax the label-independence assumption to $ε$-approximate policies, broadening applicability to learned routing. (5) Comprehensive Validation: Experiments across 6 architectures on 7 benchmarks demonstrate tightness ratios of 1.52-3.87$\times$ (all $p < 0.001$) versus $>$100$\times$ for classical bounds. Bound-guided threshold selection matches validation-tuned performance within 0.1-0.3%.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_15764
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	When Do Early-Exit Networks Generalize? A PAC-Bayesian Theory of Adaptive Depth Guo, Dongxin Wu, Jikun Yiu, Siu Ming Machine Learning Artificial Intelligence 68T07, 62C12 I.2.6; F.2.2 Early-exit neural networks enable adaptive computation by allowing confident predictions to exit at intermediate layers, achieving 2-8$\times$ inference speedup. Despite widespread deployment, their generalization properties lack theoretical understanding -- a gap explicitly identified in recent surveys. This paper establishes a unified PAC-Bayesian framework for adaptive-depth networks. (1) Novel Entropy-Based Bounds: We prove the first generalization bounds depending on exit-depth entropy $H(D)$ and expected depth $\mathbb{E}[D]$ rather than maximum depth $K$, with sample complexity $\mathcal{O}((\mathbb{E}[D] \cdot d + H(D))/ε^2)$. (2) Explicit Constructive Constants: Our analysis yields the leading coefficient $\sqrt{2\ln 2} \approx 1.177$ with complete derivation. (3) Provable Early-Exit Advantages: We establish sufficient conditions under which adaptive-depth networks strictly outperform fixed-depth counterparts. (4) Extension to Approximate Label Independence: We relax the label-independence assumption to $ε$-approximate policies, broadening applicability to learned routing. (5) Comprehensive Validation: Experiments across 6 architectures on 7 benchmarks demonstrate tightness ratios of 1.52-3.87$\times$ (all $p < 0.001$) versus $>$100$\times$ for classical bounds. Bound-guided threshold selection matches validation-tuned performance within 0.1-0.3%.
title	When Do Early-Exit Networks Generalize? A PAC-Bayesian Theory of Adaptive Depth
topic	Machine Learning Artificial Intelligence 68T07, 62C12 I.2.6; F.2.2
url	https://arxiv.org/abs/2604.15764

Similar Items