Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Parker, Julian D., Evans, Zach, Carr, CJ, Zukowski, Zachary, Taylor, Josiah, Rice, Matthew, Pons, Jordi
Format:	Preprint
Published:	2026
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.18613
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Latent representations are at the heart of the majority of modern generative models. In the audio domain they are typically produced by a neural-audio-codec autoencoder. In this work we introduce SAME (Semantically-Aligned Music autoEncoder), an autoencoder for stereo music and general audio that reaches a 4096$\times$ temporal compression ratio while maintaining reconstruction quality and downstream generative performance. We achieve this by combining a tranformer-based backbone with set of semantic regularisation approaches, phase-aware reconstruction losses and improved discriminator designs. The architecture delivers substantial computational cost benefits, through both its high compression ratio and its reliance on well-optimised transformer primitives. Two variants (a large SAME-L and a CPU-deployable SAME-S) are released in open-weights form.

Similar Items