:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Pham, Linh
Format:	Preprint
Published:	2025
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2506.00291
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Attention-Guided Adaptation for Code-Switching Speech Recognition
by: Aditya, Bobbi, et al.
Published: (2023)

Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
by: Wu, Shih-Lun, et al.
Published: (2023)

Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation
by: Guo, Haohan, et al.
Published: (2024)

Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches
by: Mujtaba, Dena, et al.
Published: (2025)

Adapting Language Balance in Code-Switching Speech
by: Ugan, Enes Yavuz, et al.
Published: (2025)

Efficient Adapter Tuning for Joint Singing Voice Beat and Downbeat Tracking with Self-supervised Learning Features
by: Deng, Jiajun, et al.
Published: (2025)

DOTA-ME-CS: Daily Oriented Text Audio-Mandarin English-Code Switching Dataset
by: Li, Yupei, et al.
Published: (2025)

CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition
by: Wang, He, et al.
Published: (2024)

HingeNet: A Harmonic-Aware Fine-Tuning Approach for Beat Tracking
by: Ru, Ganghui, et al.
Published: (2025)

MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generative Model
by: Pham, The Hieu, et al.
Published: (2025)

Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers
by: Selvaraj, Nithish Muthuchamy, et al.
Published: (2023)

Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification
by: Sang, Mufan, et al.
Published: (2024)

Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models
by: Jing, Ruihao, et al.
Published: (2025)

Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion
by: Ruggiero, Giuseppe, et al.
Published: (2024)

Monaural speech enhancement on drone via Adapter based transfer learning
by: Chen, Xingyu, et al.
Published: (2024)

SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition
by: Wang, Tianhao, et al.
Published: (2024)

Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech
by: Yao, Jixun, et al.
Published: (2025)

Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment
by: Wang, Wei, et al.
Published: (2025)

AsyncSwitch: Asynchronous Text-Speech Adaptation for Code-Switched ASR
by: Nguyen, Tuan, et al.
Published: (2025)

Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition
by: Wagner, Dominik, et al.
Published: (2025)

Efficient Emotion and Speaker Adaptation in LLM-Based TTS via Characteristic-Specific Partial Fine-Tuning
by: Wang, Tianrui, et al.
Published: (2025)

Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
by: Rouditchenko, Andrew, et al.
Published: (2025)

Speech Emotion Recognition Using Fine-Tuned DWFormer:A Study on Track 1 of the IERPChallenge 2024
by: Wang, Honghong, et al.
Published: (2025)

Fine-Tuning Automatic Speech Recognition for People with Parkinson's: An Effective Strategy for Enhancing Speech Technology Accessibility
by: Zheng, Xiuwen, et al.
Published: (2024)

VoiceTailor: Lightweight Plug-In Adapter for Diffusion-Based Personalized Text-to-Speech
by: Kim, Heeseung, et al.
Published: (2024)

Metadata-Enhanced Speech Emotion Recognition: Augmented Residual Integration and Co-Attention in Two-Stage Fine-Tuning
by: Wan, Zixiang, et al.
Published: (2024)

Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-resource Speech Recognition
by: Menon, Aditya Srinivas, et al.
Published: (2026)

Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition
by: Siriwardhana, Shamane, et al.
Published: (2020)

DQLoRA: A Lightweight Domain-Aware Denoising ASR via Adapter-guided Distillation
by: Yang, Yiru
Published: (2025)

Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores
by: Zhou, Jiaming, et al.
Published: (2024)

HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
by: Mu, Bingshen, et al.
Published: (2024)

AdaCS: Adaptive Normalization for Enhanced Code-Switching ASR
by: Chu, The Chuong, et al.
Published: (2025)

T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
by: Wang, Zehan, et al.
Published: (2025)

Crab: Multi Layer Contrastive Supervision to Improve Speech Emotion Recognition Under Both Acted and Natural Speech Condition
by: Ueda, Lucas H., et al.
Published: (2026)

Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models
by: Zheng, Xinhu, et al.
Published: (2024)

SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR
by: Ye, Shuaishuai, et al.
Published: (2024)

Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model
by: Lall, Vishakha, et al.
Published: (2024)

Fine-Tuning Whisper for Inclusive Prosodic Stress Analysis
by: Sohn, Samuel S., et al.
Published: (2025)

Persian Speech Emotion Recognition by Fine-Tuning Transformers
by: Shayaninasab, Minoo, et al.
Published: (2024)

UniCoM: A Universal Code-Switching Speech Generator
by: Lee, Sangmin, et al.
Published: (2025)