Saved in:
| Main Authors: | Tseng, Liang-Hsuan, Hu, En-Pei, Chiang, Cheng-Han, Tseng, Yuan, Lee, Hung-yi, Lee, Lin-shan, Sun, Shao-Hua |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.03988 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data
by: Tseng, Liang-Hsuan, et al.
Published: (2024)
by: Tseng, Liang-Hsuan, et al.
Published: (2024)
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
by: Tseng, Liang-Hsuan, et al.
Published: (2025)
by: Tseng, Liang-Hsuan, et al.
Published: (2025)
Consistency Based Unsupervised Self-training For ASR Personalisation
by: Zhang, Jisi, et al.
Published: (2024)
by: Zhang, Jisi, et al.
Published: (2024)
A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data
by: Chou, Cheng-Kang, et al.
Published: (2025)
by: Chou, Cheng-Kang, et al.
Published: (2025)
MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large-Audio Language Model
by: Huang, Hsiao-Ying, et al.
Published: (2025)
by: Huang, Hsiao-Ying, et al.
Published: (2025)
Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
SUTA-LM: Bridging Test-Time Adaptation and Language Model Rescoring for Robust ASR
by: Huang, Wei-Ping, et al.
Published: (2025)
by: Huang, Wei-Ping, et al.
Published: (2025)
CodecFake+: A Large-Scale Neural Audio Codec-Based Deepfake Speech Dataset
by: Chen, Xuanjun, et al.
Published: (2025)
by: Chen, Xuanjun, et al.
Published: (2025)
Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
by: Lin, Guan-Ting, et al.
Published: (2024)
by: Lin, Guan-Ting, et al.
Published: (2024)
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples
by: Kuan, Chun-Yi, et al.
Published: (2025)
by: Kuan, Chun-Yi, et al.
Published: (2025)
SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition
by: Hsu, Ming-Hao, et al.
Published: (2024)
by: Hsu, Ming-Hao, et al.
Published: (2024)
Unlocking Strong Supervision: A Data-Centric Study of General-Purpose Audio Pre-Training Methods
by: Zhou, Xuanru, et al.
Published: (2026)
by: Zhou, Xuanru, et al.
Published: (2026)
Probing the Robustness Properties of Neural Speech Codecs
by: Tseng, Wei-Cheng, et al.
Published: (2025)
by: Tseng, Wei-Cheng, et al.
Published: (2025)
ASTAR-NTU solution to AudioMOS Challenge 2025 Track1
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
by: Shi, Jiatong, et al.
Published: (2023)
by: Shi, Jiatong, et al.
Published: (2023)
Parallel Synthesis for Autoregressive Speech Generation
by: Hsu, Po-chun, et al.
Published: (2022)
by: Hsu, Po-chun, et al.
Published: (2022)
Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding
by: Huo, Mingyue, et al.
Published: (2025)
by: Huo, Mingyue, et al.
Published: (2025)
Spectral-Aware Low-Rank Adaptation for Speaker Verification
by: Li, Zhe, et al.
Published: (2025)
by: Li, Zhe, et al.
Published: (2025)
IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling
by: Huang, Kuan-Po, et al.
Published: (2025)
by: Huang, Kuan-Po, et al.
Published: (2025)
Toward Fair Speech Technologies: A Comprehensive Survey of Bias and Fairness in Speech AI
by: Lin, Yi-Cheng, et al.
Published: (2026)
by: Lin, Yi-Cheng, et al.
Published: (2026)
Neural Codec-based Adversarial Sample Detection for Speaker Verification
by: Chen, Xuanjun, et al.
Published: (2024)
by: Chen, Xuanjun, et al.
Published: (2024)
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
by: Wang, Shih-heng, et al.
Published: (2024)
by: Wang, Shih-heng, et al.
Published: (2024)
Distilling a speech and music encoder with task arithmetic
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
Investigating the Effects of Large-Scale Pseudo-Stereo Data and Different Speech Foundation Model on Dialogue Generative Spoken Language Model
by: Fu, Yu-Kuan, et al.
Published: (2024)
by: Fu, Yu-Kuan, et al.
Published: (2024)
ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models
by: Hsiao, Chi-Yuan, et al.
Published: (2026)
by: Hsiao, Chi-Yuan, et al.
Published: (2026)
Chunk Based Speech Pre-training with High Resolution Finite Scalar Quantization
by: Tang, Yun, et al.
Published: (2025)
by: Tang, Yun, et al.
Published: (2025)
Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper
by: Yang, Chih-Kai, et al.
Published: (2024)
by: Yang, Chih-Kai, et al.
Published: (2024)
How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation
by: Lu, Ke-Han, et al.
Published: (2026)
by: Lu, Ke-Han, et al.
Published: (2026)
Adaptive vector steering: A training-free, layer-wise intervention for hallucination mitigation in large audio and multimodal models
by: Lin, Tsung-En, et al.
Published: (2025)
by: Lin, Tsung-En, et al.
Published: (2025)
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition
by: Su, Hsuan, et al.
Published: (2024)
by: Su, Hsuan, et al.
Published: (2024)
persoDA: Personalized Data Augmentation for Personalized ASR
by: Parada, Pablo Peso, et al.
Published: (2025)
by: Parada, Pablo Peso, et al.
Published: (2025)
Building a Taiwanese Mandarin Spoken Language Model: A First Attempt
by: Yang, Chih-Kai, et al.
Published: (2024)
by: Yang, Chih-Kai, et al.
Published: (2024)
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
by: Lu, Ke-Han, et al.
Published: (2024)
by: Lu, Ke-Han, et al.
Published: (2024)
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
by: Liu, Wei, et al.
Published: (2024)
by: Liu, Wei, et al.
Published: (2024)
Efficient Scaling for LLM-based ASR
by: Mu, Bingshen, et al.
Published: (2025)
by: Mu, Bingshen, et al.
Published: (2025)
EMO-Codec: An In-Depth Look at Emotion Preservation capacity of Legacy and Neural Codec Models With Subjective and Objective Evaluations
by: Ren, Wenze, et al.
Published: (2024)
by: Ren, Wenze, et al.
Published: (2024)
Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages
by: Cheng, Yao-Fei, et al.
Published: (2024)
by: Cheng, Yao-Fei, et al.
Published: (2024)
Similar Items
-
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems
by: Wu, Haibin, et al.
Published: (2024) -
Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data
by: Tseng, Liang-Hsuan, et al.
Published: (2024) -
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
by: Tseng, Liang-Hsuan, et al.
Published: (2025) -
Consistency Based Unsupervised Self-training For ASR Personalisation
by: Zhang, Jisi, et al.
Published: (2024) -
A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data
by: Chou, Cheng-Kang, et al.
Published: (2025)