:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ziv, Alon, Chen, Sanyuan, Tjandra, Andros, Adi, Yossi, Hsu, Wei-Ning, Shi, Bowen
Format:	Preprint
Published:	2025
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2512.10264
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
by: Prajwal, K R, et al.
Published: (2024)

Generative Pre-training for Speech with Flow Matching
by: Liu, Alexander H., et al.
Published: (2023)

Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation
by: Yang, Mu, et al.
Published: (2024)

Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation
by: Tal, Or, et al.
Published: (2024)

Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation
by: Tal, Or, et al.
Published: (2025)

Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound
by: Tjandra, Andros, et al.
Published: (2025)

High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching
by: Lan, Gael Le, et al.
Published: (2024)

MusicGen-Stem: Multi-stem music generation and edition through autoregressive modeling
by: Rouard, Simon, et al.
Published: (2025)

Audio Conditioning for Music Generation via Discrete Bottleneck Features
by: Rouard, Simon, et al.
Published: (2024)

The AudioMOS Challenge 2025
by: Huang, Wen-Chin, et al.
Published: (2025)

MusFlow: Multimodal Music Generation via Conditional Flow Matching
by: Song, Jiahao, et al.
Published: (2025)

Masked Audio Generation using a Single Non-Autoregressive Transformer
by: Ziv, Alon, et al.
Published: (2024)

An Independence-promoting Loss for Music Generation with Language Models
by: Lemercier, Jean-Marie, et al.
Published: (2024)

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
by: Majumder, Navonil, et al.
Published: (2024)

Single-step Controllable Music Bandwidth Extension With Flow Matching
by: Hernandez-Olivan, Carlos, et al.
Published: (2026)

NAST: Noise Aware Speech Tokenization for Speech Language Models
by: Messica, Shoval, et al.
Published: (2024)

Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
by: Gao, Xiaoxue, et al.
Published: (2024)

F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
by: Sun, Xiaohui, et al.
Published: (2025)

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
by: Hung, Chia-Yu, et al.
Published: (2024)

Simple and Controllable Music Generation
by: Copet, Jade, et al.
Published: (2023)

Scaling Analysis of Interleaved Speech-Text Language Models
by: Maimon, Gallil, et al.
Published: (2025)

Knowing What to Stress: A Discourse-Conditioned Text-to-Speech Benchmark
by: Turetzky, Arnon, et al.
Published: (2026)

LAST: Language Model Aware Speech Tokenization
by: Turetzky, Arnon, et al.
Published: (2024)

Enhancing TTS Stability in Hebrew using Discrete Semantic Units
by: Zeldes, Ella, et al.
Published: (2024)

V2A-DPO: Omni-Preference Optimization for Video-to-Audio Generation
by: Chan, Nolan, et al.
Published: (2026)

LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
by: Guan, Wenhao, et al.
Published: (2024)

Audio Enhancement from Multiple Crowdsourced Recordings: A Simple and Effective Baseline
by: Aziz, Shiran, et al.
Published: (2024)

CodecFlow: Efficient Bandwidth Extension via Conditional Flow Matching in Neural Codec Latent Space
by: Zhang, Bowen, et al.
Published: (2026)

Aligning Text-to-Music Evaluation with Human Preferences
by: Huang, Yichen, et al.
Published: (2025)

FlowSynth: Instrument Generation Through Distributional Flow Matching and Test-Time Search
by: Yang, Qihui, et al.
Published: (2025)

UniFlow-Audio: Unified Flow Matching for Audio Generation from Omni-Modalities
by: Xu, Xuenan, et al.
Published: (2025)

Unsupervised Speech Segmentation: A General Approach Using Speech Language Models
by: Elmakies, Avishai, et al.
Published: (2025)

Salmon: A Suite for Acoustic Language Model Evaluation
by: Maimon, Gallil, et al.
Published: (2024)

A Language Modeling Approach to Diacritic-Free Hebrew TTS
by: Roth, Amit, et al.
Published: (2024)

WHISTRESS: Enriching Transcriptions with Sentence Stress Detection
by: Yosha, Iddo, et al.
Published: (2025)

StressTest: Can YOUR Speech LM Handle the Stress?
by: Yosha, Iddo, et al.
Published: (2025)

Latent Watermarking of Audio Generative Models
by: Roman, Robin San, et al.
Published: (2024)

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
by: Guo, Yiwei, et al.
Published: (2023)

LatentFlowSR: High-Fidelity Audio Super-Resolution via Noise-Robust Latent Flow Matching
by: Liu, Fei, et al.
Published: (2026)

RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing
by: Gao, Liting, et al.
Published: (2025)