Saved in:
Bibliographic Details
Main Authors: Gomez, Camilo, Wang, Pengyang, Tang, Liansheng
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.23336
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917297254301696
author Gomez, Camilo
Wang, Pengyang
Tang, Liansheng
author_facet Gomez, Camilo
Wang, Pengyang
Tang, Liansheng
contents Recent advances in machine learning have emphasized the integration of structured optimization components into end-to-end differentiable models, enabling richer inductive biases and tighter alignment with task-specific objectives. In this work, we introduce a novel differentiable approximation to the zero-one loss-long considered the gold standard for classification performance, yet incompatible with gradient-based optimization due to its non-differentiability. Our method constructs a smooth, order-preserving projection onto the n,k-dimensional hypersimplex through a constrained optimization framework, leading to a new operator we term Soft-Binary-Argmax. After deriving its mathematical properties, we show how its Jacobian can be efficiently computed and integrated into binary and multiclass learning systems. Empirically, our approach achieves significant improvements in generalization under large-batch training by imposing geometric consistency constraints on the output logits, thereby narrowing the performance gap traditionally observed in large-batch training.
format Preprint
id arxiv_https___arxiv_org_abs_2602_23336
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Differentiable Zero-One Loss via Hypersimplex Projections
Gomez, Camilo
Wang, Pengyang
Tang, Liansheng
Machine Learning
Recent advances in machine learning have emphasized the integration of structured optimization components into end-to-end differentiable models, enabling richer inductive biases and tighter alignment with task-specific objectives. In this work, we introduce a novel differentiable approximation to the zero-one loss-long considered the gold standard for classification performance, yet incompatible with gradient-based optimization due to its non-differentiability. Our method constructs a smooth, order-preserving projection onto the n,k-dimensional hypersimplex through a constrained optimization framework, leading to a new operator we term Soft-Binary-Argmax. After deriving its mathematical properties, we show how its Jacobian can be efficiently computed and integrated into binary and multiclass learning systems. Empirically, our approach achieves significant improvements in generalization under large-batch training by imposing geometric consistency constraints on the output logits, thereby narrowing the performance gap traditionally observed in large-batch training.
title Differentiable Zero-One Loss via Hypersimplex Projections
topic Machine Learning
url https://arxiv.org/abs/2602.23336