:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tseng, Liang-Hsuan, Hu, En-Pei, Chiang, Cheng-Han, Tseng, Yuan, Lee, Hung-yi, Lee, Lin-shan, Sun, Shao-Hua
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Computation and Language Sound
Online Access:	https://arxiv.org/abs/2402.03988
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems
by: Wu, Haibin, et al.
Published: (2024)

Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data
by: Tseng, Liang-Hsuan, et al.
Published: (2024)

TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
by: Tseng, Liang-Hsuan, et al.
Published: (2025)

Consistency Based Unsupervised Self-training For ASR Personalisation
by: Zhang, Jisi, et al.
Published: (2024)

A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data
by: Chou, Cheng-Kang, et al.
Published: (2025)

MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model
by: Huang, Hsiao-Ying, et al.
Published: (2025)

Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach
by: Lin, Yi-Cheng, et al.
Published: (2025)

SUTA-LM: Bridging Test-Time Adaptation and Language Model Rescoring for Robust ASR
by: Huang, Wei-Ping, et al.
Published: (2025)

CodecFake+: A Large-Scale Neural Audio Codec-Based Deepfake Speech Dataset
by: Chen, Xuanjun, et al.
Published: (2025)

Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
by: Lin, Guan-Ting, et al.
Published: (2024)

Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
by: Kuan, Chun-Yi, et al.
Published: (2024)

Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples
by: Kuan, Chun-Yi, et al.
Published: (2025)

SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition
by: Hsu, Ming-Hao, et al.
Published: (2024)

Unlocking Strong Supervision: A Data-Centric Study of General-Purpose Audio Pre-Training Methods
by: Zhou, Xuanru, et al.
Published: (2026)

Probing the Robustness Properties of Neural Speech Codecs
by: Tseng, Wei-Cheng, et al.
Published: (2025)

ASTAR-NTU solution to AudioMOS Challenge 2025 Track1
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
by: Shi, Jiatong, et al.
Published: (2023)

Parallel Synthesis for Autoregressive Speech Generation
by: Hsu, Po-chun, et al.
Published: (2022)

Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding
by: Huo, Mingyue, et al.
Published: (2025)

Spectral-Aware Low-Rank Adaptation for Speaker Verification
by: Li, Zhe, et al.
Published: (2025)

IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling
by: Huang, Kuan-Po, et al.
Published: (2025)

Toward Fair Speech Technologies: A Comprehensive Survey of Bias and Fairness in Speech AI
by: Lin, Yi-Cheng, et al.
Published: (2026)

Neural Codec-based Adversarial Sample Detection for Speaker Verification
by: Chen, Xuanjun, et al.
Published: (2024)

Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
by: Wang, Shih-heng, et al.
Published: (2024)

Distilling a speech and music encoder with task arithmetic
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)

Investigating the Effects of Large-Scale Pseudo-Stereo Data and Different Speech Foundation Model on Dialogue Generative Spoken Language Model
by: Fu, Yu-Kuan, et al.
Published: (2024)

ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models
by: Hsiao, Chi-Yuan, et al.
Published: (2026)

Chunk Based Speech Pre-training with High Resolution Finite Scalar Quantization
by: Tang, Yun, et al.
Published: (2025)

Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper
by: Yang, Chih-Kai, et al.
Published: (2024)

How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation
by: Lu, Ke-Han, et al.
Published: (2026)

Adaptive vector steering: A training-free, layer-wise intervention for hallucination mitigation in large audio and multimodal models
by: Lin, Tsung-En, et al.
Published: (2025)

Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
by: Wu, Haibin, et al.
Published: (2024)

Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition
by: Su, Hsuan, et al.
Published: (2024)

persoDA: Personalized Data Augmentation for Personalized ASR
by: Parada, Pablo Peso, et al.
Published: (2025)

Building a Taiwanese Mandarin Spoken Language Model: A First Attempt
by: Yang, Chih-Kai, et al.
Published: (2024)

DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
by: Lu, Ke-Han, et al.
Published: (2024)

LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
by: Liu, Wei, et al.
Published: (2024)

Efficient Scaling for LLM-based ASR
by: Mu, Bingshen, et al.
Published: (2025)

EMO-Codec: An In-Depth Look at Emotion Preservation capacity of Legacy and Neural Codec Models With Subjective and Objective Evaluations
by: Ren, Wenze, et al.
Published: (2024)

Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages
by: Cheng, Yao-Fei, et al.
Published: (2024)