Saved in:
| Main Authors: | Chien, Chung-Ming, Tjandra, Andros, Vyas, Apoorv, Le, Matt, Shi, Bowen, Hsu, Wei-Ning |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.06251 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Generative Pre-training for Speech with Flow Matching
by: Liu, Alexander H., et al.
Published: (2023)
by: Liu, Alexander H., et al.
Published: (2023)
Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation
by: Yang, Mu, et al.
Published: (2024)
by: Yang, Mu, et al.
Published: (2024)
SAM Audio Judge: A Unified Multimodal Framework for Perceptual Evaluation of Audio Separation
by: Wang, Helin, et al.
Published: (2026)
by: Wang, Helin, et al.
Published: (2026)
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound
by: Tjandra, Andros, et al.
Published: (2025)
by: Tjandra, Andros, et al.
Published: (2025)
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
by: Prajwal, K R, et al.
Published: (2024)
by: Prajwal, K R, et al.
Published: (2024)
SAM Audio: Segment Anything in Audio
by: Shi, Bowen, et al.
Published: (2025)
by: Shi, Bowen, et al.
Published: (2025)
The AudioMOS Challenge 2025
by: Huang, Wen-Chin, et al.
Published: (2025)
by: Huang, Wen-Chin, et al.
Published: (2025)
Fine-Grained Quantitative Emotion Editing for Speech Generation
by: Inoue, Sho, et al.
Published: (2024)
by: Inoue, Sho, et al.
Published: (2024)
Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition
by: Wagner, Dominik, et al.
Published: (2025)
by: Wagner, Dominik, et al.
Published: (2025)
YNote: A Novel Music Notation for Fine-Tuning LLMs in Music Generation
by: Lu, Shao-Chien, et al.
Published: (2025)
by: Lu, Shao-Chien, et al.
Published: (2025)
Fine-Grained and Interpretable Neural Speech Editing
by: Morrison, Max, et al.
Published: (2024)
by: Morrison, Max, et al.
Published: (2024)
Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches
by: Mujtaba, Dena, et al.
Published: (2025)
by: Mujtaba, Dena, et al.
Published: (2025)
Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations
by: Guo, Xin, et al.
Published: (2026)
by: Guo, Xin, et al.
Published: (2026)
Parameter-Efficient Fine-Tuning of Foundation Models for CLP Speech Classification
by: Bhattacharjee, Susmita, et al.
Published: (2025)
by: Bhattacharjee, Susmita, et al.
Published: (2025)
Toward Natural Emotional Text-To-Speech System with Fine-Grained Non-Verbal Expression Control
by: Zhou, Wangzixi, et al.
Published: (2026)
by: Zhou, Wangzixi, et al.
Published: (2026)
Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
by: Yang, Yifan, et al.
Published: (2026)
by: Yang, Yifan, et al.
Published: (2026)
MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generative Model
by: Pham, The Hieu, et al.
Published: (2025)
by: Pham, The Hieu, et al.
Published: (2025)
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
by: Xu, Chun, et al.
Published: (2024)
by: Xu, Chun, et al.
Published: (2024)
Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
by: Shen, Siyuan, et al.
Published: (2024)
by: Shen, Siyuan, et al.
Published: (2024)
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
by: Han, HyoJung, et al.
Published: (2024)
by: Han, HyoJung, et al.
Published: (2024)
Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models
by: Jing, Ruihao, et al.
Published: (2025)
by: Jing, Ruihao, et al.
Published: (2025)
Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model
by: Li, Guojian, et al.
Published: (2026)
by: Li, Guojian, et al.
Published: (2026)
Overcoming Data Scarcity in Multi-Dialectal Arabic ASR via Whisper Fine-Tuning
by: Özyilmaz, Ömer Tarik, et al.
Published: (2025)
by: Özyilmaz, Ömer Tarik, et al.
Published: (2025)
StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-Speech
by: Lou, Haowei, et al.
Published: (2024)
by: Lou, Haowei, et al.
Published: (2024)
Does Fine-tuning by Reinforcement Learning Improve Generalization in Binary Speech Deepfake Detection?
by: Wang, Xin, et al.
Published: (2026)
by: Wang, Xin, et al.
Published: (2026)
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
by: Fang, Yangui, et al.
Published: (2025)
by: Fang, Yangui, et al.
Published: (2025)
Persian Speech Emotion Recognition by Fine-Tuning Transformers
by: Shayaninasab, Minoo, et al.
Published: (2024)
by: Shayaninasab, Minoo, et al.
Published: (2024)
UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
by: Tu, Wenming, et al.
Published: (2025)
by: Tu, Wenming, et al.
Published: (2025)
Fine-Tuning Automatic Speech Recognition for People with Parkinson's: An Effective Strategy for Enhancing Speech Technology Accessibility
by: Zheng, Xiuwen, et al.
Published: (2024)
by: Zheng, Xiuwen, et al.
Published: (2024)
Speech Recognition Model Improves Text-to-Speech Synthesis using Fine-Grained Reward
by: Wang, Guansu, et al.
Published: (2025)
by: Wang, Guansu, et al.
Published: (2025)
Metadata-Enhanced Speech Emotion Recognition: Augmented Residual Integration and Co-Attention in Two-Stage Fine-Tuning
by: Wan, Zixiang, et al.
Published: (2024)
by: Wan, Zixiang, et al.
Published: (2024)
Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-resource Speech Recognition
by: Menon, Aditya Srinivas, et al.
Published: (2026)
by: Menon, Aditya Srinivas, et al.
Published: (2026)
Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models
by: Raina, Vyas, et al.
Published: (2024)
by: Raina, Vyas, et al.
Published: (2024)
EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering
by: Xie, Tianxin, et al.
Published: (2025)
by: Xie, Tianxin, et al.
Published: (2025)
FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control
by: von Rütte, Dimitri, et al.
Published: (2022)
by: von Rütte, Dimitri, et al.
Published: (2022)
CLEP-DG: Contrastive Learning for Speech Emotion Domain Generalization via Soft Prompt Tuning
by: Shi, Jiacheng, et al.
Published: (2025)
by: Shi, Jiacheng, et al.
Published: (2025)
Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
Fine-Tuning Text-to-Speech Diffusion Models Using Reinforcement Learning with Human Feedback
by: Chen, Jingyi, et al.
Published: (2025)
by: Chen, Jingyi, et al.
Published: (2025)
SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition
by: Wang, Pu, et al.
Published: (2026)
by: Wang, Pu, et al.
Published: (2026)
Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting
by: Chen, Haolin, et al.
Published: (2024)
by: Chen, Haolin, et al.
Published: (2024)
Similar Items
-
Generative Pre-training for Speech with Flow Matching
by: Liu, Alexander H., et al.
Published: (2023) -
Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation
by: Yang, Mu, et al.
Published: (2024) -
SAM Audio Judge: A Unified Multimodal Framework for Perceptual Evaluation of Audio Separation
by: Wang, Helin, et al.
Published: (2026) -
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound
by: Tjandra, Andros, et al.
Published: (2025) -
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
by: Prajwal, K R, et al.
Published: (2024)