Saved in:
| Main Authors: | Yin, Chun, Chi, Tai-Shih, Tsao, Yu, Wang, Hsin-Min |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.08445 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Robust Assessment of Pathological Voices via Combined Low-Level Descriptors and Foundation Model Representations
by: Ariyanti, Whenty, et al.
Published: (2025)
by: Ariyanti, Whenty, et al.
Published: (2025)
Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model
by: Zezario, Ryandhimas E., et al.
Published: (2023)
by: Zezario, Ryandhimas E., et al.
Published: (2023)
Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features
by: Zezario, Ryandhimas E., et al.
Published: (2021)
by: Zezario, Ryandhimas E., et al.
Published: (2021)
HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids
by: Wisnu, Dyah A. M. G., et al.
Published: (2024)
by: Wisnu, Dyah A. M. G., et al.
Published: (2024)
Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata
by: Zezario, Ryandhimas E., et al.
Published: (2023)
by: Zezario, Ryandhimas E., et al.
Published: (2023)
Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis
by: Carbonneau, Marc-André, et al.
Published: (2025)
by: Carbonneau, Marc-André, et al.
Published: (2025)
A Study on Zero-shot Non-intrusive Speech Assessment using Large Language Models
by: Zezario, Ryandhimas E., et al.
Published: (2024)
by: Zezario, Ryandhimas E., et al.
Published: (2024)
Improving Perceptual Audio Aesthetic Assessment via Triplet Loss and Self-Supervised Embeddings
by: Wisnu, Dyah A. M. G., et al.
Published: (2025)
by: Wisnu, Dyah A. M. G., et al.
Published: (2025)
A Layer-Anchoring Strategy for Enhancing Cross-Lingual Speech Emotion Recognition
by: Upadhyay, Shreya G., et al.
Published: (2024)
by: Upadhyay, Shreya G., et al.
Published: (2024)
Speech Intelligibility Assessment with Uncertainty-Aware Whisper Embeddings and sLSTM
by: Zezario, Ryandhimas E., et al.
Published: (2025)
by: Zezario, Ryandhimas E., et al.
Published: (2025)
USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models
by: Zhao, Guanlong, et al.
Published: (2023)
by: Zhao, Guanlong, et al.
Published: (2023)
The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
by: Huang, Wen-Chin, et al.
Published: (2024)
by: Huang, Wen-Chin, et al.
Published: (2024)
Speech to Speech Synthesis for Voice Impersonation
by: Johnson, Bjorn, et al.
Published: (2026)
by: Johnson, Bjorn, et al.
Published: (2026)
Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification
by: Sang, Mufan, et al.
Published: (2024)
by: Sang, Mufan, et al.
Published: (2024)
A Study on Incorporating Whisper for Robust Speech Assessment
by: Zezario, Ryandhimas E., et al.
Published: (2023)
by: Zezario, Ryandhimas E., et al.
Published: (2023)
Voice Signal Processing for Machine Learning. The Case of Speaker Isolation
by: Ganchev, Radan
Published: (2024)
by: Ganchev, Radan
Published: (2024)
On the Generation and Removal of Speaker Adversarial Perturbation for Voice-Privacy Protection
by: Guo, Chenyang, et al.
Published: (2024)
by: Guo, Chenyang, et al.
Published: (2024)
Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition
by: Ravenscroft, William, et al.
Published: (2024)
by: Ravenscroft, William, et al.
Published: (2024)
Abnormal Respiratory Sound Identification Using Audio-Spectrogram Vision Transformer
by: Ariyanti, Whenty, et al.
Published: (2024)
by: Ariyanti, Whenty, et al.
Published: (2024)
A Study on Speech Assessment with Visual Cues
by: Ahmed, Shafique, et al.
Published: (2025)
by: Ahmed, Shafique, et al.
Published: (2025)
Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026)
by: Wang, Chien-Chun, et al.
Published: (2026)
Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification
by: Liao, Yen-Lun, et al.
Published: (2022)
by: Liao, Yen-Lun, et al.
Published: (2022)
A Study on Zero-Shot Non-Intrusive Speech Intelligibility for Hearing Aids Using Large Language Models
by: Zezario, Ryandhimas E., et al.
Published: (2025)
by: Zezario, Ryandhimas E., et al.
Published: (2025)
Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations
by: Kakoulidis, Panos, et al.
Published: (2024)
by: Kakoulidis, Panos, et al.
Published: (2024)
More Similar than Dissimilar: Modeling Annotators for Cross-Corpus Speech Emotion Recognition
by: Tavernor, James, et al.
Published: (2025)
by: Tavernor, James, et al.
Published: (2025)
CogniVoice: Multimodal and Multilingual Fusion Networks for Mild Cognitive Impairment Assessment from Spontaneous Speech
by: Cheng, Jiali, et al.
Published: (2024)
by: Cheng, Jiali, et al.
Published: (2024)
Safeguarding Privacy in Edge Speech Understanding with Tiny Foundation Models
by: Benazir, Afsara, et al.
Published: (2025)
by: Benazir, Afsara, et al.
Published: (2025)
On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis
by: Sarkar, Eklavya, et al.
Published: (2024)
by: Sarkar, Eklavya, et al.
Published: (2024)
Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech
by: de Oliveira, Danilo, et al.
Published: (2024)
by: de Oliveira, Danilo, et al.
Published: (2024)
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech
by: Avdeeva, Anastasia, et al.
Published: (2024)
by: Avdeeva, Anastasia, et al.
Published: (2024)
Multiple Choice Learning for Efficient Speech Separation with Many Speakers
by: Perera, David, et al.
Published: (2024)
by: Perera, David, et al.
Published: (2024)
SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization
by: Tang, Beilong, et al.
Published: (2025)
by: Tang, Beilong, et al.
Published: (2025)
Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?
by: Feng, Tiantian, et al.
Published: (2024)
by: Feng, Tiantian, et al.
Published: (2024)
VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
by: Lin, Weiwei, et al.
Published: (2024)
by: Lin, Weiwei, et al.
Published: (2024)
Language Modelling for Speaker Diarization in Telephonic Interviews
by: India, Miquel, et al.
Published: (2025)
by: India, Miquel, et al.
Published: (2025)
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
by: Chang, Heng-Jui, et al.
Published: (2024)
by: Chang, Heng-Jui, et al.
Published: (2024)
REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion
by: Biyani, Ishan D., et al.
Published: (2025)
by: Biyani, Ishan D., et al.
Published: (2025)
UniPET-SPK: A Unified Framework for Parameter-Efficient Tuning of Pre-trained Speech Models for Robust Speaker Verification
by: Sang, Mufan, et al.
Published: (2025)
by: Sang, Mufan, et al.
Published: (2025)
Mouth Articulation-Based Anchoring for Improved Cross-Corpus Speech Emotion Recognition
by: Upadhyay, Shreya G., et al.
Published: (2024)
by: Upadhyay, Shreya G., et al.
Published: (2024)
Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation
by: Nie, Jingping, et al.
Published: (2025)
by: Nie, Jingping, et al.
Published: (2025)
Similar Items
-
Towards Robust Assessment of Pathological Voices via Combined Low-Level Descriptors and Foundation Model Representations
by: Ariyanti, Whenty, et al.
Published: (2025) -
Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model
by: Zezario, Ryandhimas E., et al.
Published: (2023) -
Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features
by: Zezario, Ryandhimas E., et al.
Published: (2021) -
HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids
by: Wisnu, Dyah A. M. G., et al.
Published: (2024) -
Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata
by: Zezario, Ryandhimas E., et al.
Published: (2023)