:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Tsao, Yu
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2509.01889
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MOS-Bias: From Hidden Gender Bias to Gender-Aware Speech Quality Assessment
by: Ren, Wenze, et al.
Published: (2026)

TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations
by: Gao, Xiaoxue, et al.
Published: (2024)

A Study on Incorporating Whisper for Robust Speech Assessment
by: Zezario, Ryandhimas E., et al.
Published: (2023)

A Study on Speech Assessment with Visual Cues
by: Ahmed, Shafique, et al.
Published: (2025)

Speech Intelligibility Assessment with Uncertainty-Aware Whisper Embeddings and sLSTM
by: Zezario, Ryandhimas E., et al.
Published: (2025)

A Study on Zero-shot Non-intrusive Speech Assessment using Large Language Models
by: Zezario, Ryandhimas E., et al.
Published: (2024)

Unsupervised Face-Masked Speech Enhancement Using Generative Adversarial Networks With Human-in-the-Loop Assessment Metrics
by: Wang, Syu-Siang, et al.
Published: (2024)

CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech
by: Wang, Helin, et al.
Published: (2025)

HighRateMOS: Sampling-Rate Aware Modeling for Speech Quality Assessment
by: Ren, Wenze, et al.
Published: (2025)

Audio-Visual Speech Enhancement in Noisy Environments via Emotion-Based Contextual Cues
by: Hussain, Tassadaq, et al.
Published: (2024)

EffortNet: A Deep Learning Framework for Objective Assessment of Speech Enhancement Technologies Using EEG-Based Alpha Oscillations
by: Sung, Ching-Chih, et al.
Published: (2025)

SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models
by: Yin, Chun, et al.
Published: (2024)

Tracking Listener Attention: Gaze-Guided Audio-Visual Speech Enhancement Framework
by: Yang, Hsiang-Cheng, et al.
Published: (2026)

Universal Speech Enhancement with Regression and Generative Mamba
by: Chao, Rong, et al.
Published: (2025)

An Investigation on Combining Geometry and Consistency Constraints into Phase Estimation for Speech Enhancement
by: Ho, Chun-Wei, et al.
Published: (2025)

Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models
by: Zezario, Ryandhimas E., et al.
Published: (2026)

STSM-FiLM: A FiLM-Conditioned Neural Architecture for Time-Scale Modification of Speech
by: Wisnu, Dyah A. M. G., et al.
Published: (2025)

Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
by: Aldeneh, Zakaria, et al.
Published: (2024)

Towards General Discrete Speech Codec for Complex Acoustic Environments: A Study of Reconstruction and Downstream Task Consistency
by: Wang, Haoran, et al.
Published: (2025)

Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model
by: Zezario, Ryandhimas E., et al.
Published: (2023)

Feature Importance across Domains for Improving Non-Intrusive Speech Intelligibility Prediction in Hearing Aids
by: Zezario, Ryandhimas E., et al.
Published: (2025)

FastEnhancer: Speed-Optimized Streaming Neural Speech Enhancement
by: Ahn, Sunghwan, et al.
Published: (2025)

A Study on Zero-Shot Non-Intrusive Speech Intelligibility for Hearing Aids Using Large Language Models
by: Zezario, Ryandhimas E., et al.
Published: (2025)

CodecFake+: A Large-Scale Neural Audio Codec-Based Deepfake Speech Dataset
by: Chen, Xuanjun, et al.
Published: (2025)

Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features
by: Zezario, Ryandhimas E., et al.
Published: (2021)

Robust Audio-Visual Speech Enhancement: Correcting Misassignments in Complex Environments with Advanced Post-Processing
by: Ren, Wenze, et al.
Published: (2024)

Visual-Informed Speech Enhancement Using Attention-Based Beamforming
by: Liu, Chihyun, et al.
Published: (2026)

Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition
by: Wang, Kuan-Chen, et al.
Published: (2024)

Multivariate Probabilistic Assessment of Speech Quality
by: Cumlin, Fredrik, et al.
Published: (2025)

The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
by: Huang, Wen-Chin, et al.
Published: (2024)

Leveraging Self-Supervised Audio-Visual Pretrained Models to Improve Vocoded Speech Intelligibility in Cochlear Implant Simulation
by: Lai, Richard Lee, et al.
Published: (2023)

Exploiting Consistency-Preserving Loss and Perceptual Contrast Stretching to Boost SSL-based Speech Enhancement
by: Khan, Muhammad Salman, et al.
Published: (2024)

QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
by: Wang, Siyin, et al.
Published: (2025)

EMO-Codec: An In-Depth Look at Emotion Preservation capacity of Legacy and Neural Codec Models With Subjective and Objective Evaluations
by: Ren, Wenze, et al.
Published: (2024)

Leveraging Mamba with Full-Face Vision for Audio-Visual Speech Enhancement
by: Chao, Rong, et al.
Published: (2025)

HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids
by: Wisnu, Dyah A. M. G., et al.
Published: (2024)

Towards Environmental Preference Based Speech Enhancement For Individualised Multi-Modal Hearing Aids
by: Kirton-Wingate, Jasper, et al.
Published: (2024)

Linguistic Knowledge Transfer Learning for Speech Enhancement
by: Hung, Kuo-Hsuan, et al.
Published: (2025)

Coupling Speech Encoders with Downstream Text Models
by: Chelba, Ciprian, et al.
Published: (2024)

Universal Preference-Score-based Pairwise Speech Quality Assessment
by: Shi, Yu-Fei, et al.
Published: (2025)