Saved in:
Bibliographic Details
Main Authors: Huang, Zitong, Montazerin, Mansooreh, Srivastava, Ajitesh
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.08270
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911272636776448
author Huang, Zitong
Montazerin, Mansooreh
Srivastava, Ajitesh
author_facet Huang, Zitong
Montazerin, Mansooreh
Srivastava, Ajitesh
contents Designing neural networks typically relies on manual trial and error or a neural architecture search (NAS) followed by weight training. The former is time-consuming and labor-intensive, while the latter often discretizes architecture search and weight optimization. In this paper, we propose a fundamentally different approach that simultaneously optimizes both the architecture and the weights of a neural network. Our framework first trains a universal multi-scale autoencoder that embeds both architectural and parametric information into a continuous latent space, where functionally similar neural networks are mapped closer together. Given a dataset, we then randomly initialize a point in the embedding space and update it via gradient descent to obtain the optimal neural network, jointly optimizing its structure and weights. The optimization process incorporates sparsity and compactness penalties to promote efficient models. Experiments on synthetic regression tasks demonstrate that our method effectively discovers sparse and compact neural networks with strong performance.
format Preprint
id arxiv_https___arxiv_org_abs_2506_08270
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space
Huang, Zitong
Montazerin, Mansooreh
Srivastava, Ajitesh
Machine Learning
Designing neural networks typically relies on manual trial and error or a neural architecture search (NAS) followed by weight training. The former is time-consuming and labor-intensive, while the latter often discretizes architecture search and weight optimization. In this paper, we propose a fundamentally different approach that simultaneously optimizes both the architecture and the weights of a neural network. Our framework first trains a universal multi-scale autoencoder that embeds both architectural and parametric information into a continuous latent space, where functionally similar neural networks are mapped closer together. Given a dataset, we then randomly initialize a point in the embedding space and update it via gradient descent to obtain the optimal neural network, jointly optimizing its structure and weights. The optimization process incorporates sparsity and compactness penalties to promote efficient models. Experiments on synthetic regression tasks demonstrate that our method effectively discovers sparse and compact neural networks with strong performance.
title SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space
topic Machine Learning
url https://arxiv.org/abs/2506.08270