:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chien, Chung-Ming, Tjandra, Andros, Vyas, Apoorv, Le, Matt, Shi, Bowen, Hsu, Wei-Ning
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Computation and Language
Online Access:	https://arxiv.org/abs/2406.06251
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Generative Pre-training for Speech with Flow Matching
by: Liu, Alexander H., et al.
Published: (2023)

Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation
by: Yang, Mu, et al.
Published: (2024)

SAM Audio Judge: A Unified Multimodal Framework for Perceptual Evaluation of Audio Separation
by: Wang, Helin, et al.
Published: (2026)

Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound
by: Tjandra, Andros, et al.
Published: (2025)

MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
by: Prajwal, K R, et al.
Published: (2024)

SAM Audio: Segment Anything in Audio
by: Shi, Bowen, et al.
Published: (2025)

The AudioMOS Challenge 2025
by: Huang, Wen-Chin, et al.
Published: (2025)

Fine-Grained Quantitative Emotion Editing for Speech Generation
by: Inoue, Sho, et al.
Published: (2024)

Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition
by: Wagner, Dominik, et al.
Published: (2025)

YNote: A Novel Music Notation for Fine-Tuning LLMs in Music Generation
by: Lu, Shao-Chien, et al.
Published: (2025)

Fine-Grained and Interpretable Neural Speech Editing
by: Morrison, Max, et al.
Published: (2024)

Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches
by: Mujtaba, Dena, et al.
Published: (2025)

Adaptive Federated Fine-Tuning of Self-Supervised Speech Representations
by: Guo, Xin, et al.
Published: (2026)

Parameter-Efficient Fine-Tuning of Foundation Models for CLP Speech Classification
by: Bhattacharjee, Susmita, et al.
Published: (2025)

Toward Natural Emotional Text-To-Speech System with Fine-Grained Non-Verbal Expression Control
by: Zhou, Wangzixi, et al.
Published: (2026)

Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
by: Yang, Yifan, et al.
Published: (2026)

MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generative Model
by: Pham, The Hieu, et al.
Published: (2025)

Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
by: Xu, Chun, et al.
Published: (2024)

Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
by: Shen, Siyuan, et al.
Published: (2024)

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
by: Han, HyoJung, et al.
Published: (2024)

Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models
by: Jing, Ruihao, et al.
Published: (2025)

Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model
by: Li, Guojian, et al.
Published: (2026)

Overcoming Data Scarcity in Multi-Dialectal Arabic ASR via Whisper Fine-Tuning
by: Özyilmaz, Ömer Tarik, et al.
Published: (2025)

StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-Speech
by: Lou, Haowei, et al.
Published: (2024)

Does Fine-tuning by Reinforcement Learning Improve Generalization in Binary Speech Deepfake Detection?
by: Wang, Xin, et al.
Published: (2026)

Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
by: Fang, Yangui, et al.
Published: (2025)

Persian Speech Emotion Recognition by Fine-Tuning Transformers
by: Shayaninasab, Minoo, et al.
Published: (2024)

UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
by: Tu, Wenming, et al.
Published: (2025)

Fine-Tuning Automatic Speech Recognition for People with Parkinson's: An Effective Strategy for Enhancing Speech Technology Accessibility
by: Zheng, Xiuwen, et al.
Published: (2024)

Speech Recognition Model Improves Text-to-Speech Synthesis using Fine-Grained Reward
by: Wang, Guansu, et al.
Published: (2025)

Metadata-Enhanced Speech Emotion Recognition: Augmented Residual Integration and Co-Attention in Two-Stage Fine-Tuning
by: Wan, Zixiang, et al.
Published: (2024)

Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-resource Speech Recognition
by: Menon, Aditya Srinivas, et al.
Published: (2026)

Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models
by: Raina, Vyas, et al.
Published: (2024)

EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering
by: Xie, Tianxin, et al.
Published: (2025)

FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control
by: von Rütte, Dimitri, et al.
Published: (2022)

CLEP-DG: Contrastive Learning for Speech Emotion Domain Generalization via Soft Prompt Tuning
by: Shi, Jiacheng, et al.
Published: (2025)

Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech
by: Yao, Jixun, et al.
Published: (2025)

Fine-Tuning Text-to-Speech Diffusion Models Using Reinforcement Learning with Human Feedback
by: Chen, Jingyi, et al.
Published: (2025)

SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition
by: Wang, Pu, et al.
Published: (2026)

Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting
by: Chen, Haolin, et al.
Published: (2024)