:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Yang, Jianing, Nakata, Wataru, Saito, Yuki, Saruwatari, Hiroshi
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Sound
Accesso online:	https://arxiv.org/abs/2601.13700
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Multi-Sampling-Frequency Naturalness MOS Prediction Using Self-Supervised Learning Model with Sampling-Frequency-Independent Layer
di: Nishikawa, Go, et al.
Pubblicazione: (2025)

The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
di: Baba, Kaito, et al.
Pubblicazione: (2024)

Causal Speech Enhancement with Predicting Semantics based on Quantized Self-supervised Learning Features
di: Tsunoo, Emiru, et al.
Pubblicazione: (2024)

Geneses: Unified Generative Speech Enhancement and Separation
di: Asai, Kohei, et al.
Pubblicazione: (2026)

Sidon: Fast and Robust Open-Source Multilingual Speech Restoration for Large-scale Dataset Cleansing
di: Nakata, Wataru, et al.
Pubblicazione: (2025)

DialogueSidon: Recovering Full-Duplex Dialogue Tracks from In-the-Wild Dialogue Audio
di: Nakata, Wataru, et al.
Pubblicazione: (2026)

J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling
di: Nakata, Wataru, et al.
Pubblicazione: (2024)

Building speech corpus with diverse voice characteristics for its prompt-based representation
di: Watanabe, Aya, et al.
Pubblicazione: (2024)

Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
di: Yang, Jianing, et al.
Pubblicazione: (2025)

SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction
di: Agrawal, Saurabh, et al.
Pubblicazione: (2025)

SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction
di: Tang, Yuxun, et al.
Pubblicazione: (2024)

Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT
di: Yamauchi, Kazuki, et al.
Pubblicazione: (2024)

APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech
di: Lian, Zhicheng, et al.
Pubblicazione: (2025)

Spatial-CLAP: Learning Spatially-Aware audio--text Embeddings for Multi-Source Conditions
di: Seki, Kentaro, et al.
Pubblicazione: (2025)

The AudioMOS Challenge 2025
di: Huang, Wen-Chin, et al.
Pubblicazione: (2025)

Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech
di: Yang, Dong, et al.
Pubblicazione: (2024)

The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
di: Huang, Wen-Chin, et al.
Pubblicazione: (2024)

SAMOS: A Neural MOS Prediction Model Leveraging Semantic Representations and Acoustic Features
di: Shi, Yu-Fei, et al.
Pubblicazione: (2024)

UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge
di: Nakata, Wataru, et al.
Pubblicazione: (2024)

CodecMOS-Accent: A MOS Benchmark of Resynthesized and TTS Speech from Neural Codecs Across English Accents
di: Huang, Wen-Chin, et al.
Pubblicazione: (2026)

Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis
di: Yang, Dong, et al.
Pubblicazione: (2025)

DRASP: A Dual-Resolution Attentive Statistics Pooling Framework for Automatic MOS Prediction
di: Yang, Cheng-Yeh, et al.
Pubblicazione: (2025)

WhisQ: Cross-Modal Representation Learning for Text-to-Music MOS Prediction
di: Emon, Jakaria Islam, et al.
Pubblicazione: (2025)

RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio
di: Kanamori, Yusuke, et al.
Pubblicazione: (2025)

Human-CLAP: Human-perception-based contrastive language-audio pretraining
di: Takano, Taisei, et al.
Pubblicazione: (2025)

UrgentMOS: Unified Multi-Metric and Preference Learning for Robust Speech Quality Assessment
di: Wang, Wei, et al.
Pubblicazione: (2026)

Investigating the Reasonable Effectiveness of Speaker Pre-Trained Models and their Synergistic Power for SingMOS Prediction
di: Phukan, Orchid Chetia, et al.
Pubblicazione: (2025)

S-SONDO: Self-Supervised Knowledge Distillation for General Audio Foundation Models
di: Adlouni, Mohammed Ali El, et al.
Pubblicazione: (2026)

Selecting N-lowest scores for training MOS prediction models
di: Kondo, Yuto, et al.
Pubblicazione: (2025)

MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech Quality Assessment Models
di: Huang, Wen-Chin, et al.
Pubblicazione: (2024)

Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals
di: Seki, Kentaro, et al.
Pubblicazione: (2024)

JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
di: Xin, Detai, et al.
Pubblicazione: (2023)

Distillation and Pruning for Scalable Self-Supervised Representation-Based Speech Quality Assessment
di: Stahl, Benjamin, et al.
Pubblicazione: (2025)

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
di: Chen, Yafeng, et al.
Pubblicazione: (2024)

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
di: Chen, Yafeng, et al.
Pubblicazione: (2023)

ASTAR-NTU solution to AudioMOS Challenge 2025 Track1
di: Ritter-Gutierrez, Fabian, et al.
Pubblicazione: (2025)

SingMOS-Pro: An Comprehensive Benchmark for Singing Quality Assessment
di: Tang, Yuxun, et al.
Pubblicazione: (2025)

STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
di: Jang, Kangwook, et al.
Pubblicazione: (2023)

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
di: Wang, Yujin, et al.
Pubblicazione: (2022)

Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment
di: Igarashi, Takuto, et al.
Pubblicazione: (2024)