Saved in:
Bibliographic Details
Main Author: Masalskikh, Aleksandr
Format: Recurso digital
Language:English
Published: Zenodo 2026
Subjects:
Online Access:https://doi.org/10.5281/zenodo.19232218
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • <p>We introduce Steklov activations, a piecewise-polynomial activation family derived from B-spline antiderivatives. Parameterized by order r (smoothness) and scale α (transition width), they produce exact zero output and gradient outside a compact support. At α=2 the activation approximates GELU (sup error <0.0091); at α=6 it is exactly HardSwish. On image classification (MNIST, CIFAR-10, CIFAR-100 across LeNet-5, ResNet-18, and WideResNet-28-10), Steklov achieves the highest accuracy on all benchmarks. On language modeling (GPT-2 124M/354M, LLaMA-style 105M), it matches GELU and improves over SiLU. The compact support induces tunable neuron inactivity (3–83%) that is stable across data splits and distributions. Pruning inactive neurons removes 7–11% of parameters with negligible quality loss; a Triton kernel then delivers 3–6% faster inference than unpruned GELU.</p>