:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Seki, Kentaro, Takamichi, Shinnosuke, Saeki, Takaaki, Saruwatari, Hiroshi
Formato:	Preprint
Publicado:	2025
Materias:	Sound
Acceso en línea:	https://arxiv.org/abs/2506.15614
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Active Learning for Text-to-Speech Synthesis with Informative Sample Collection
por: Seki, Kentaro, et al.
Publicado: (2025)

J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling
por: Nakata, Wataru, et al.
Publicado: (2024)

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics
por: Saeki, Takaaki, et al.
Publicado: (2024)

SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark
por: Saito, Yuki, et al.
Publicado: (2024)

Spatial-CLAP: Learning Spatially-Aware audio--text Embeddings for Multi-Source Conditions
por: Seki, Kentaro, et al.
Publicado: (2025)

Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment
por: Igarashi, Takuto, et al.
Publicado: (2024)

Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals
por: Seki, Kentaro, et al.
Publicado: (2024)

JaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus
por: Nakamura, Tomohiko, et al.
Publicado: (2022)

JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
por: Xin, Detai, et al.
Publicado: (2023)

BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec
por: Xin, Detai, et al.
Publicado: (2024)

Drum-to-Vocal Percussion Sound Conversion and Its Evaluation Methodology
por: Nobukawa, Rinka, et al.
Publicado: (2025)

DNN-based ensemble singing voice synthesis with interactions between singers
por: Hyodo, Hiroaki, et al.
Publicado: (2024)

SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis
por: Take, Osamu, et al.
Publicado: (2024)

YODAS: Youtube-Oriented Dataset for Audio and Speech
por: Li, Xinjian, et al.
Publicado: (2024)

RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio
por: Kanamori, Yusuke, et al.
Publicado: (2025)

Building speech corpus with diverse voice characteristics for its prompt-based representation
por: Watanabe, Aya, et al.
Publicado: (2024)

Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data
por: Suda, Hitoshi, et al.
Publicado: (2024)

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
por: Kawamura, Masaya, et al.
Publicado: (2024)

Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models
por: Kando, Shunsuke, et al.
Publicado: (2025)

Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora
por: Suda, Hitoshi, et al.
Publicado: (2025)

Sign-to-Speech Prosody Transfer via Sign Reconstruction-based GAN
por: Manabe, Toranosuke, et al.
Publicado: (2026)

Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
por: Jeon, Yejin, et al.
Publicado: (2024)

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
por: Xin, Detai, et al.
Publicado: (2024)

DistilMOS: Layer-Wise Self-Distillation For Self-Supervised Learning Model-Based MOS Prediction
por: Yang, Jianing, et al.
Publicado: (2026)

Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
por: Ko, Myeongjin, et al.
Publicado: (2023)

Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT
por: Yamauchi, Kazuki, et al.
Publicado: (2024)

Learning Marmoset Vocal Patterns with a Masked Autoencoder for Robust Call Segmentation, Classification, and Caller Identification
por: Wu, Bin, et al.
Publicado: (2024)

Binaural rendering from microphone array signals of arbitrary geometry
por: Iijima, Naoto, et al.
Publicado: (2021)

AudioBERTScore: Objective Evaluation of Environmental Sound Synthesis Based on Similarity of Audio embedding Sequences
por: Kishi, Minoru, et al.
Publicado: (2025)

Can We Really Repurpose Multi-Speaker ASR Corpus for Speaker Diarization?
por: Horiguchi, Shota, et al.
Publicado: (2025)

Dissecting Performance Degradation in Audio Source Separation under Sampling Frequency Mismatch
por: Imamura, Kanami, et al.
Publicado: (2026)

Hyperbolic Embeddings for Order-Aware Classification of Audio Effect Chains
por: Wada, Aogu, et al.
Publicado: (2025)

Localizing Acoustic Energy in Sound Field Synthesis by Directionally Weighted Exterior Radiation Suppression
por: Tomita, Yoshihide, et al.
Publicado: (2024)

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
por: Fu, Ruibo, et al.
Publicado: (2024)

Facilitating Personalized TTS for Dysarthric Speakers Using Knowledge Anchoring and Curriculum Learning
por: Jeon, Yejin, et al.
Publicado: (2025)

Perturbation Self-Supervised Representations for Cross-Lingual Emotion TTS: Stage-Wise Modeling of Emotion and Speaker
por: Gong, Cheng, et al.
Publicado: (2025)

Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs
por: He, Xinlu, et al.
Publicado: (2025)

Multi-Sampling-Frequency Naturalness MOS Prediction Using Self-Supervised Learning Model with Sampling-Frequency-Independent Layer
por: Nishikawa, Go, et al.
Publicado: (2025)

An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS
por: Kunešová, Marie, et al.
Publicado: (2025)

Sidon: Fast and Robust Open-Source Multilingual Speech Restoration for Large-scale Dataset Cleansing
por: Nakata, Wataru, et al.
Publicado: (2025)