Saved in:
| Main Authors: | Yang, Cheng-Yeh, Huang, Kuan-Tang, Wang, Chien-Chun, Lee, Hung-Shin, Wang, Hsin-Min, Chen, Berlin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.21407 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
QAMRO: Quality-aware Adaptive Margin Ranking Optimization for Human-aligned Assessment of Audio Generation Systems
by: Wang, Chien-Chun, et al.
Published: (2025)
by: Wang, Chien-Chun, et al.
Published: (2025)
Robust Generative Audio Quality Assessment: Disentangling Quality from Spurious Correlations
by: Huang, Kuan-Tang, et al.
Published: (2026)
by: Huang, Kuan-Tang, et al.
Published: (2026)
TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition
by: Yang, Cheng-Yeh, et al.
Published: (2026)
by: Yang, Cheng-Yeh, et al.
Published: (2026)
Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026)
by: Wang, Chien-Chun, et al.
Published: (2026)
Revealing the Role of Audio Channels in ASR Performance Degradation
by: Huang, Kuan-Tang, et al.
Published: (2025)
by: Huang, Kuan-Tang, et al.
Published: (2025)
Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation
by: Wang, Chien-Chun, et al.
Published: (2024)
by: Wang, Chien-Chun, et al.
Published: (2024)
Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition
by: Wang, Chien-Chun, et al.
Published: (2024)
by: Wang, Chien-Chun, et al.
Published: (2024)
Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing
by: Peng, An-Ci, et al.
Published: (2026)
by: Peng, An-Ci, et al.
Published: (2026)
CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition
by: Sung, Hung-Yang, et al.
Published: (2025)
by: Sung, Hung-Yang, et al.
Published: (2025)
SincQDR-VAD: A Noise-Robust Voice Activity Detection Framework Leveraging Learnable Filters and Ranking-Aware Optimization
by: Wang, Chien-Chun, et al.
Published: (2025)
by: Wang, Chien-Chun, et al.
Published: (2025)
ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning
by: Ho, Kuan-Hsun, et al.
Published: (2024)
by: Ho, Kuan-Hsun, et al.
Published: (2024)
The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
by: Huang, Wen-Chin, et al.
Published: (2024)
by: Huang, Wen-Chin, et al.
Published: (2024)
ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization
by: Yan, Bi-Cheng, et al.
Published: (2024)
by: Yan, Bi-Cheng, et al.
Published: (2024)
What do neural networks listen to? Exploring the crucial bands in Speech Enhancement using Sinc-convolution
by: Ho, Kuan-Hsun, et al.
Published: (2024)
by: Ho, Kuan-Hsun, et al.
Published: (2024)
A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions
by: Wang, Chung-Chun, et al.
Published: (2025)
by: Wang, Chung-Chun, et al.
Published: (2025)
Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages
by: Cheng, Yao-Fei, et al.
Published: (2024)
by: Cheng, Yao-Fei, et al.
Published: (2024)
SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction
by: Tang, Yuxun, et al.
Published: (2024)
by: Tang, Yuxun, et al.
Published: (2024)
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
by: Wang, Shih-heng, et al.
Published: (2024)
by: Wang, Shih-heng, et al.
Published: (2024)
APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech
by: Lian, Zhicheng, et al.
Published: (2025)
by: Lian, Zhicheng, et al.
Published: (2025)
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples
by: Kuan, Chun-Yi, et al.
Published: (2025)
by: Kuan, Chun-Yi, et al.
Published: (2025)
The AudioMOS Challenge 2025
by: Huang, Wen-Chin, et al.
Published: (2025)
by: Huang, Wen-Chin, et al.
Published: (2025)
DistilMOS: Layer-Wise Self-Distillation For Self-Supervised Learning Model-Based MOS Prediction
by: Yang, Jianing, et al.
Published: (2026)
by: Yang, Jianing, et al.
Published: (2026)
Recursive Attentive Pooling for Extracting Speaker Embeddings from Multi-Speaker Recordings
by: Horiguchi, Shota, et al.
Published: (2024)
by: Horiguchi, Shota, et al.
Published: (2024)
ASTAR-NTU solution to AudioMOS Challenge 2025 Track1
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection
by: Yang, Tzu-Ting, et al.
Published: (2024)
by: Yang, Tzu-Ting, et al.
Published: (2024)
SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction
by: Agrawal, Saurabh, et al.
Published: (2025)
by: Agrawal, Saurabh, et al.
Published: (2025)
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering
by: Kuan, Chun-Yi, et al.
Published: (2026)
by: Kuan, Chun-Yi, et al.
Published: (2026)
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
by: Kuan, Chun-Yi, et al.
Published: (2025)
by: Kuan, Chun-Yi, et al.
Published: (2025)
Query-by-Example Keyword Spotting Using Spectral-Temporal Graph Attentive Pooling and Multi-Task Learning
by: Wang, Zhenyu, et al.
Published: (2024)
by: Wang, Zhenyu, et al.
Published: (2024)
Acoustically Precise Hesitation Tagging Is Essential for End-to-End Verbatim Transcription Systems
by: Lin, Jhen-Ke, et al.
Published: (2025)
by: Lin, Jhen-Ke, et al.
Published: (2025)
Advancing Automated Speaking Assessment Leveraging Multifaceted Relevance and Grammar Information
by: Lu, Hao-Chien, et al.
Published: (2025)
by: Lu, Hao-Chien, et al.
Published: (2025)
A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data
by: Chou, Cheng-Kang, et al.
Published: (2025)
by: Chou, Cheng-Kang, et al.
Published: (2025)
Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models
by: Kuan, Chun-Yi, et al.
Published: (2026)
by: Kuan, Chun-Yi, et al.
Published: (2026)
Speech Emotion Recognition Leveraging OpenAI's Whisper Representations and Attentive Pooling Methods
by: Shendabadi, Ali, et al.
Published: (2026)
by: Shendabadi, Ali, et al.
Published: (2026)
CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification
by: Peng, Junyi, et al.
Published: (2024)
by: Peng, Junyi, et al.
Published: (2024)
An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement
by: Yang, Tzu-Ting, et al.
Published: (2024)
by: Yang, Tzu-Ting, et al.
Published: (2024)
Causal Tracing of Audio-Text Fusion in Large Audio Language Models
by: Chen, Wei-Chih, et al.
Published: (2026)
by: Chen, Wei-Chih, et al.
Published: (2026)
CodecMOS-Accent: A MOS Benchmark of Resynthesized and TTS Speech from Neural Codecs Across English Accents
by: Huang, Wen-Chin, et al.
Published: (2026)
by: Huang, Wen-Chin, et al.
Published: (2026)
Similar Items
-
QAMRO: Quality-aware Adaptive Margin Ranking Optimization for Human-aligned Assessment of Audio Generation Systems
by: Wang, Chien-Chun, et al.
Published: (2025) -
Robust Generative Audio Quality Assessment: Disentangling Quality from Spurious Correlations
by: Huang, Kuan-Tang, et al.
Published: (2026) -
TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition
by: Yang, Cheng-Yeh, et al.
Published: (2026) -
Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026) -
Revealing the Role of Audio Channels in ASR Performance Degradation
by: Huang, Kuan-Tang, et al.
Published: (2025)