:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Seki, Kentaro, Okamoto, Yuki, Yamaoka, Kouei, Saito, Yuki, Takamichi, Shinnosuke, Saruwatari, Hiroshi
Format:	Preprint
Published:	2025
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2509.14785
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio
by: Kanamori, Yusuke, et al.
Published: (2025)

Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals
by: Seki, Kentaro, et al.
Published: (2024)

Human-CLAP: Human-perception-based contrastive language-audio pretraining
by: Takano, Taisei, et al.
Published: (2025)

Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment
by: Igarashi, Takuto, et al.
Published: (2024)

TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data
by: Seki, Kentaro, et al.
Published: (2025)

J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling
by: Nakata, Wataru, et al.
Published: (2024)

SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark
by: Saito, Yuki, et al.
Published: (2024)

Active Learning for Text-to-Speech Synthesis with Informative Sample Collection
by: Seki, Kentaro, et al.
Published: (2025)

Building speech corpus with diverse voice characteristics for its prompt-based representation
by: Watanabe, Aya, et al.
Published: (2024)

JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
by: Xin, Detai, et al.
Published: (2023)

BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec
by: Xin, Detai, et al.
Published: (2024)

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics
by: Saeki, Takaaki, et al.
Published: (2024)

Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT
by: Yamauchi, Kazuki, et al.
Published: (2024)

Sidon: Fast and Robust Open-Source Multilingual Speech Restoration for Large-scale Dataset Cleansing
by: Nakata, Wataru, et al.
Published: (2025)

DistilMOS: Layer-Wise Self-Distillation For Self-Supervised Learning Model-Based MOS Prediction
by: Yang, Jianing, et al.
Published: (2026)

AudioBERTScore: Objective Evaluation of Environmental Sound Synthesis Based on Similarity of Audio embedding Sequences
by: Kishi, Minoru, et al.
Published: (2025)

Drum-to-Vocal Percussion Sound Conversion and Its Evaluation Methodology
by: Nobukawa, Rinka, et al.
Published: (2025)

DNN-based ensemble singing voice synthesis with interactions between singers
by: Hyodo, Hiroaki, et al.
Published: (2024)

Causal Speech Enhancement with Predicting Semantics based on Quantized Self-supervised Learning Features
by: Tsunoo, Emiru, et al.
Published: (2024)

JaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus
by: Nakamura, Tomohiko, et al.
Published: (2022)

Geneses: Unified Generative Speech Enhancement and Separation
by: Asai, Kohei, et al.
Published: (2026)

Fast Multichannel NMF with Block-Diagonal Spatial Covariance Matrices for Efficient Blind Source Separation Using Distributed Microphone Arrays
by: Nishikori, Hirotaka, et al.
Published: (2026)

The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
by: Baba, Kaito, et al.
Published: (2024)

Multi-Sampling-Frequency Naturalness MOS Prediction Using Self-Supervised Learning Model with Sampling-Frequency-Independent Layer
by: Nishikawa, Go, et al.
Published: (2025)

DialogueSidon: Recovering Full-Duplex Dialogue Tracks from In-the-Wild Dialogue Audio
by: Nakata, Wataru, et al.
Published: (2026)

Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis
by: Yang, Dong, et al.
Published: (2025)

Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
by: Yang, Jianing, et al.
Published: (2025)

SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis
by: Take, Osamu, et al.
Published: (2024)

Hyperbolic Embeddings for Order-Aware Classification of Audio Effect Chains
by: Wada, Aogu, et al.
Published: (2025)

Real-time Speech Extraction Using Spatially Regularized Independent Low-rank Matrix Analysis and Rank-constrained Spatial Covariance Matrix Estimation
by: Ishikawa, Yuto, et al.
Published: (2024)

Construction and Analysis of Impression Caption Dataset for Environmental Sounds
by: Okamoto, Yuki, et al.
Published: (2024)

Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora
by: Suda, Hitoshi, et al.
Published: (2025)

Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data
by: Suda, Hitoshi, et al.
Published: (2024)

Dissecting Performance Degradation in Audio Source Separation under Sampling Frequency Mismatch
by: Imamura, Kanami, et al.
Published: (2026)

Sign-to-Speech Prosody Transfer via Sign Reconstruction-based GAN
by: Manabe, Toranosuke, et al.
Published: (2026)

Analysing the Language of Neural Audio Codecs
by: Park, Joonyong, et al.
Published: (2025)

Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models
by: Kando, Shunsuke, et al.
Published: (2025)

Learning Marmoset Vocal Patterns with a Masked Autoencoder for Robust Call Segmentation, Classification, and Caller Identification
by: Wu, Bin, et al.
Published: (2024)

ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
by: Jing, Xin, et al.
Published: (2024)

Learning Spatially-Aware Language and Audio Embeddings
by: Devnani, Bhavika, et al.
Published: (2024)