:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yin, Chun, Chi, Tai-Shih, Tsao, Yu, Wang, Hsin-Min
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Machine Learning Sound
Online Access:	https://arxiv.org/abs/2406.08445
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards Robust Assessment of Pathological Voices via Combined Low-Level Descriptors and Foundation Model Representations
by: Ariyanti, Whenty, et al.
Published: (2025)

Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model
by: Zezario, Ryandhimas E., et al.
Published: (2023)

Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features
by: Zezario, Ryandhimas E., et al.
Published: (2021)

HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids
by: Wisnu, Dyah A. M. G., et al.
Published: (2024)

Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata
by: Zezario, Ryandhimas E., et al.
Published: (2023)

Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis
by: Carbonneau, Marc-André, et al.
Published: (2025)

A Study on Zero-shot Non-intrusive Speech Assessment using Large Language Models
by: Zezario, Ryandhimas E., et al.
Published: (2024)

Improving Perceptual Audio Aesthetic Assessment via Triplet Loss and Self-Supervised Embeddings
by: Wisnu, Dyah A. M. G., et al.
Published: (2025)

A Layer-Anchoring Strategy for Enhancing Cross-Lingual Speech Emotion Recognition
by: Upadhyay, Shreya G., et al.
Published: (2024)

Speech Intelligibility Assessment with Uncertainty-Aware Whisper Embeddings and sLSTM
by: Zezario, Ryandhimas E., et al.
Published: (2025)

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models
by: Zhao, Guanlong, et al.
Published: (2023)

The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
by: Huang, Wen-Chin, et al.
Published: (2024)

Speech to Speech Synthesis for Voice Impersonation
by: Johnson, Bjorn, et al.
Published: (2026)

Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification
by: Sang, Mufan, et al.
Published: (2024)

A Study on Incorporating Whisper for Robust Speech Assessment
by: Zezario, Ryandhimas E., et al.
Published: (2023)

Voice Signal Processing for Machine Learning. The Case of Speaker Isolation
by: Ganchev, Radan
Published: (2024)

On the Generation and Removal of Speaker Adversarial Perturbation for Voice-Privacy Protection
by: Guo, Chenyang, et al.
Published: (2024)

Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition
by: Ravenscroft, William, et al.
Published: (2024)

Abnormal Respiratory Sound Identification Using Audio-Spectrogram Vision Transformer
by: Ariyanti, Whenty, et al.
Published: (2024)

A Study on Speech Assessment with Visual Cues
by: Ahmed, Shafique, et al.
Published: (2025)

Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026)

Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification
by: Liao, Yen-Lun, et al.
Published: (2022)

A Study on Zero-Shot Non-Intrusive Speech Intelligibility for Hearing Aids Using Large Language Models
by: Zezario, Ryandhimas E., et al.
Published: (2025)

Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations
by: Kakoulidis, Panos, et al.
Published: (2024)

More Similar than Dissimilar: Modeling Annotators for Cross-Corpus Speech Emotion Recognition
by: Tavernor, James, et al.
Published: (2025)

CogniVoice: Multimodal and Multilingual Fusion Networks for Mild Cognitive Impairment Assessment from Spontaneous Speech
by: Cheng, Jiali, et al.
Published: (2024)

Safeguarding Privacy in Edge Speech Understanding with Tiny Foundation Models
by: Benazir, Afsara, et al.
Published: (2025)

On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis
by: Sarkar, Eklavya, et al.
Published: (2024)

Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech
by: de Oliveira, Danilo, et al.
Published: (2024)

Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech
by: Avdeeva, Anastasia, et al.
Published: (2024)

Multiple Choice Learning for Efficient Speech Separation with Many Speakers
by: Perera, David, et al.
Published: (2024)

SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization
by: Tang, Beilong, et al.
Published: (2025)

Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?
by: Feng, Tiantian, et al.
Published: (2024)

VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
by: Lin, Weiwei, et al.
Published: (2024)

Language Modelling for Speaker Diarization in Telephonic Interviews
by: India, Miquel, et al.
Published: (2025)

DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
by: Chang, Heng-Jui, et al.
Published: (2024)

REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion
by: Biyani, Ishan D., et al.
Published: (2025)

UniPET-SPK: A Unified Framework for Parameter-Efficient Tuning of Pre-trained Speech Models for Robust Speaker Verification
by: Sang, Mufan, et al.
Published: (2025)

Mouth Articulation-Based Anchoring for Improved Cross-Corpus Speech Emotion Recognition
by: Upadhyay, Shreya G., et al.
Published: (2024)

Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation
by: Nie, Jingping, et al.
Published: (2025)