Saved in:
| Main Authors: | Ziv, Alon, Chen, Sanyuan, Tjandra, Andros, Adi, Yossi, Hsu, Wei-Ning, Shi, Bowen |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.10264 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
by: Prajwal, K R, et al.
Published: (2024)
by: Prajwal, K R, et al.
Published: (2024)
Generative Pre-training for Speech with Flow Matching
by: Liu, Alexander H., et al.
Published: (2023)
by: Liu, Alexander H., et al.
Published: (2023)
Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation
by: Yang, Mu, et al.
Published: (2024)
by: Yang, Mu, et al.
Published: (2024)
Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation
by: Tal, Or, et al.
Published: (2024)
by: Tal, Or, et al.
Published: (2024)
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation
by: Tal, Or, et al.
Published: (2025)
by: Tal, Or, et al.
Published: (2025)
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound
by: Tjandra, Andros, et al.
Published: (2025)
by: Tjandra, Andros, et al.
Published: (2025)
High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching
by: Lan, Gael Le, et al.
Published: (2024)
by: Lan, Gael Le, et al.
Published: (2024)
MusicGen-Stem: Multi-stem music generation and edition through autoregressive modeling
by: Rouard, Simon, et al.
Published: (2025)
by: Rouard, Simon, et al.
Published: (2025)
Audio Conditioning for Music Generation via Discrete Bottleneck Features
by: Rouard, Simon, et al.
Published: (2024)
by: Rouard, Simon, et al.
Published: (2024)
The AudioMOS Challenge 2025
by: Huang, Wen-Chin, et al.
Published: (2025)
by: Huang, Wen-Chin, et al.
Published: (2025)
MusFlow: Multimodal Music Generation via Conditional Flow Matching
by: Song, Jiahao, et al.
Published: (2025)
by: Song, Jiahao, et al.
Published: (2025)
Masked Audio Generation using a Single Non-Autoregressive Transformer
by: Ziv, Alon, et al.
Published: (2024)
by: Ziv, Alon, et al.
Published: (2024)
An Independence-promoting Loss for Music Generation with Language Models
by: Lemercier, Jean-Marie, et al.
Published: (2024)
by: Lemercier, Jean-Marie, et al.
Published: (2024)
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
by: Majumder, Navonil, et al.
Published: (2024)
by: Majumder, Navonil, et al.
Published: (2024)
Single-step Controllable Music Bandwidth Extension With Flow Matching
by: Hernandez-Olivan, Carlos, et al.
Published: (2026)
by: Hernandez-Olivan, Carlos, et al.
Published: (2026)
NAST: Noise Aware Speech Tokenization for Speech Language Models
by: Messica, Shoval, et al.
Published: (2024)
by: Messica, Shoval, et al.
Published: (2024)
Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
by: Gao, Xiaoxue, et al.
Published: (2024)
by: Gao, Xiaoxue, et al.
Published: (2024)
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
by: Sun, Xiaohui, et al.
Published: (2025)
by: Sun, Xiaohui, et al.
Published: (2025)
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
by: Hung, Chia-Yu, et al.
Published: (2024)
by: Hung, Chia-Yu, et al.
Published: (2024)
Simple and Controllable Music Generation
by: Copet, Jade, et al.
Published: (2023)
by: Copet, Jade, et al.
Published: (2023)
Scaling Analysis of Interleaved Speech-Text Language Models
by: Maimon, Gallil, et al.
Published: (2025)
by: Maimon, Gallil, et al.
Published: (2025)
Knowing What to Stress: A Discourse-Conditioned Text-to-Speech Benchmark
by: Turetzky, Arnon, et al.
Published: (2026)
by: Turetzky, Arnon, et al.
Published: (2026)
LAST: Language Model Aware Speech Tokenization
by: Turetzky, Arnon, et al.
Published: (2024)
by: Turetzky, Arnon, et al.
Published: (2024)
Enhancing TTS Stability in Hebrew using Discrete Semantic Units
by: Zeldes, Ella, et al.
Published: (2024)
by: Zeldes, Ella, et al.
Published: (2024)
V2A-DPO: Omni-Preference Optimization for Video-to-Audio Generation
by: Chan, Nolan, et al.
Published: (2026)
by: Chan, Nolan, et al.
Published: (2026)
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
by: Guan, Wenhao, et al.
Published: (2024)
by: Guan, Wenhao, et al.
Published: (2024)
Audio Enhancement from Multiple Crowdsourced Recordings: A Simple and Effective Baseline
by: Aziz, Shiran, et al.
Published: (2024)
by: Aziz, Shiran, et al.
Published: (2024)
CodecFlow: Efficient Bandwidth Extension via Conditional Flow Matching in Neural Codec Latent Space
by: Zhang, Bowen, et al.
Published: (2026)
by: Zhang, Bowen, et al.
Published: (2026)
Aligning Text-to-Music Evaluation with Human Preferences
by: Huang, Yichen, et al.
Published: (2025)
by: Huang, Yichen, et al.
Published: (2025)
FlowSynth: Instrument Generation Through Distributional Flow Matching and Test-Time Search
by: Yang, Qihui, et al.
Published: (2025)
by: Yang, Qihui, et al.
Published: (2025)
UniFlow-Audio: Unified Flow Matching for Audio Generation from Omni-Modalities
by: Xu, Xuenan, et al.
Published: (2025)
by: Xu, Xuenan, et al.
Published: (2025)
Unsupervised Speech Segmentation: A General Approach Using Speech Language Models
by: Elmakies, Avishai, et al.
Published: (2025)
by: Elmakies, Avishai, et al.
Published: (2025)
Salmon: A Suite for Acoustic Language Model Evaluation
by: Maimon, Gallil, et al.
Published: (2024)
by: Maimon, Gallil, et al.
Published: (2024)
A Language Modeling Approach to Diacritic-Free Hebrew TTS
by: Roth, Amit, et al.
Published: (2024)
by: Roth, Amit, et al.
Published: (2024)
WHISTRESS: Enriching Transcriptions with Sentence Stress Detection
by: Yosha, Iddo, et al.
Published: (2025)
by: Yosha, Iddo, et al.
Published: (2025)
StressTest: Can YOUR Speech LM Handle the Stress?
by: Yosha, Iddo, et al.
Published: (2025)
by: Yosha, Iddo, et al.
Published: (2025)
Latent Watermarking of Audio Generative Models
by: Roman, Robin San, et al.
Published: (2024)
by: Roman, Robin San, et al.
Published: (2024)
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
by: Guo, Yiwei, et al.
Published: (2023)
by: Guo, Yiwei, et al.
Published: (2023)
LatentFlowSR: High-Fidelity Audio Super-Resolution via Noise-Robust Latent Flow Matching
by: Liu, Fei, et al.
Published: (2026)
by: Liu, Fei, et al.
Published: (2026)
RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing
by: Gao, Liting, et al.
Published: (2025)
by: Gao, Liting, et al.
Published: (2025)
Similar Items
-
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
by: Prajwal, K R, et al.
Published: (2024) -
Generative Pre-training for Speech with Flow Matching
by: Liu, Alexander H., et al.
Published: (2023) -
Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation
by: Yang, Mu, et al.
Published: (2024) -
Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation
by: Tal, Or, et al.
Published: (2024) -
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation
by: Tal, Or, et al.
Published: (2025)