Saved in:
| Main Author: | |
|---|---|
| Format: | Recurso digital |
| Language: | English |
| Published: |
Zenodo
2026
|
| Subjects: | |
| Online Access: | https://doi.org/10.5281/zenodo.19232218 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Table of Contents:
- <p>We introduce Steklov activations, a piecewise-polynomial activation family derived from B-spline antiderivatives. Parameterized by order r (smoothness) and scale α (transition width), they produce exact zero output and gradient outside a compact support. At α=2 the activation approximates GELU (sup error <0.0091); at α=6 it is exactly HardSwish. On image classification (MNIST, CIFAR-10, CIFAR-100 across LeNet-5, ResNet-18, and WideResNet-28-10), Steklov achieves the highest accuracy on all benchmarks. On language modeling (GPT-2 124M/354M, LLaMA-style 105M), it matches GELU and improves over SiLU. The compact support induces tunable neuron inactivity (3–83%) that is stable across data splits and distributions. Pruning inactive neurons removes 7–11% of parameters with negligible quality loss; a Triton kernel then delivers 3–6% faster inference than unpruned GELU.</p>