Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Huang, Zitong, Montazerin, Mansooreh, Srivastava, Ajitesh
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2506.08270
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911272636776448
author	Huang, Zitong Montazerin, Mansooreh Srivastava, Ajitesh
author_facet	Huang, Zitong Montazerin, Mansooreh Srivastava, Ajitesh
contents	Designing neural networks typically relies on manual trial and error or a neural architecture search (NAS) followed by weight training. The former is time-consuming and labor-intensive, while the latter often discretizes architecture search and weight optimization. In this paper, we propose a fundamentally different approach that simultaneously optimizes both the architecture and the weights of a neural network. Our framework first trains a universal multi-scale autoencoder that embeds both architectural and parametric information into a continuous latent space, where functionally similar neural networks are mapped closer together. Given a dataset, we then randomly initialize a point in the embedding space and update it via gradient descent to obtain the optimal neural network, jointly optimizing its structure and weights. The optimization process incorporates sparsity and compactness penalties to promote efficient models. Experiments on synthetic regression tasks demonstrate that our method effectively discovers sparse and compact neural networks with strong performance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_08270
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space Huang, Zitong Montazerin, Mansooreh Srivastava, Ajitesh Machine Learning Designing neural networks typically relies on manual trial and error or a neural architecture search (NAS) followed by weight training. The former is time-consuming and labor-intensive, while the latter often discretizes architecture search and weight optimization. In this paper, we propose a fundamentally different approach that simultaneously optimizes both the architecture and the weights of a neural network. Our framework first trains a universal multi-scale autoencoder that embeds both architectural and parametric information into a continuous latent space, where functionally similar neural networks are mapped closer together. Given a dataset, we then randomly initialize a point in the embedding space and update it via gradient descent to obtain the optimal neural network, jointly optimizing its structure and weights. The optimization process incorporates sparsity and compactness penalties to promote efficient models. Experiments on synthetic regression tasks demonstrate that our method effectively discovers sparse and compact neural networks with strong performance.
title	SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space
topic	Machine Learning
url	https://arxiv.org/abs/2506.08270

Similar Items