Guardado en:
| Autores principales: | Seki, Kentaro, Takamichi, Shinnosuke, Saeki, Takaaki, Saruwatari, Hiroshi |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2506.15614 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Active Learning for Text-to-Speech Synthesis with Informative Sample Collection
por: Seki, Kentaro, et al.
Publicado: (2025)
por: Seki, Kentaro, et al.
Publicado: (2025)
J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling
por: Nakata, Wataru, et al.
Publicado: (2024)
por: Nakata, Wataru, et al.
Publicado: (2024)
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics
por: Saeki, Takaaki, et al.
Publicado: (2024)
por: Saeki, Takaaki, et al.
Publicado: (2024)
SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark
por: Saito, Yuki, et al.
Publicado: (2024)
por: Saito, Yuki, et al.
Publicado: (2024)
Spatial-CLAP: Learning Spatially-Aware audio--text Embeddings for Multi-Source Conditions
por: Seki, Kentaro, et al.
Publicado: (2025)
por: Seki, Kentaro, et al.
Publicado: (2025)
Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment
por: Igarashi, Takuto, et al.
Publicado: (2024)
por: Igarashi, Takuto, et al.
Publicado: (2024)
Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals
por: Seki, Kentaro, et al.
Publicado: (2024)
por: Seki, Kentaro, et al.
Publicado: (2024)
JaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus
por: Nakamura, Tomohiko, et al.
Publicado: (2022)
por: Nakamura, Tomohiko, et al.
Publicado: (2022)
JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
por: Xin, Detai, et al.
Publicado: (2023)
por: Xin, Detai, et al.
Publicado: (2023)
BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec
por: Xin, Detai, et al.
Publicado: (2024)
por: Xin, Detai, et al.
Publicado: (2024)
Drum-to-Vocal Percussion Sound Conversion and Its Evaluation Methodology
por: Nobukawa, Rinka, et al.
Publicado: (2025)
por: Nobukawa, Rinka, et al.
Publicado: (2025)
DNN-based ensemble singing voice synthesis with interactions between singers
por: Hyodo, Hiroaki, et al.
Publicado: (2024)
por: Hyodo, Hiroaki, et al.
Publicado: (2024)
SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis
por: Take, Osamu, et al.
Publicado: (2024)
por: Take, Osamu, et al.
Publicado: (2024)
YODAS: Youtube-Oriented Dataset for Audio and Speech
por: Li, Xinjian, et al.
Publicado: (2024)
por: Li, Xinjian, et al.
Publicado: (2024)
RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio
por: Kanamori, Yusuke, et al.
Publicado: (2025)
por: Kanamori, Yusuke, et al.
Publicado: (2025)
Building speech corpus with diverse voice characteristics for its prompt-based representation
por: Watanabe, Aya, et al.
Publicado: (2024)
por: Watanabe, Aya, et al.
Publicado: (2024)
Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data
por: Suda, Hitoshi, et al.
Publicado: (2024)
por: Suda, Hitoshi, et al.
Publicado: (2024)
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
por: Kawamura, Masaya, et al.
Publicado: (2024)
por: Kawamura, Masaya, et al.
Publicado: (2024)
Exploring the Effect of Segmentation and Vocabulary Size on Speech Tokenization for Speech Language Models
por: Kando, Shunsuke, et al.
Publicado: (2025)
por: Kando, Shunsuke, et al.
Publicado: (2025)
Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora
por: Suda, Hitoshi, et al.
Publicado: (2025)
por: Suda, Hitoshi, et al.
Publicado: (2025)
Sign-to-Speech Prosody Transfer via Sign Reconstruction-based GAN
por: Manabe, Toranosuke, et al.
Publicado: (2026)
por: Manabe, Toranosuke, et al.
Publicado: (2026)
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
por: Jeon, Yejin, et al.
Publicado: (2024)
por: Jeon, Yejin, et al.
Publicado: (2024)
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
por: Xin, Detai, et al.
Publicado: (2024)
por: Xin, Detai, et al.
Publicado: (2024)
DistilMOS: Layer-Wise Self-Distillation For Self-Supervised Learning Model-Based MOS Prediction
por: Yang, Jianing, et al.
Publicado: (2026)
por: Yang, Jianing, et al.
Publicado: (2026)
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
por: Ko, Myeongjin, et al.
Publicado: (2023)
por: Ko, Myeongjin, et al.
Publicado: (2023)
Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT
por: Yamauchi, Kazuki, et al.
Publicado: (2024)
por: Yamauchi, Kazuki, et al.
Publicado: (2024)
Learning Marmoset Vocal Patterns with a Masked Autoencoder for Robust Call Segmentation, Classification, and Caller Identification
por: Wu, Bin, et al.
Publicado: (2024)
por: Wu, Bin, et al.
Publicado: (2024)
Binaural rendering from microphone array signals of arbitrary geometry
por: Iijima, Naoto, et al.
Publicado: (2021)
por: Iijima, Naoto, et al.
Publicado: (2021)
AudioBERTScore: Objective Evaluation of Environmental Sound Synthesis Based on Similarity of Audio embedding Sequences
por: Kishi, Minoru, et al.
Publicado: (2025)
por: Kishi, Minoru, et al.
Publicado: (2025)
Can We Really Repurpose Multi-Speaker ASR Corpus for Speaker Diarization?
por: Horiguchi, Shota, et al.
Publicado: (2025)
por: Horiguchi, Shota, et al.
Publicado: (2025)
Dissecting Performance Degradation in Audio Source Separation under Sampling Frequency Mismatch
por: Imamura, Kanami, et al.
Publicado: (2026)
por: Imamura, Kanami, et al.
Publicado: (2026)
Hyperbolic Embeddings for Order-Aware Classification of Audio Effect Chains
por: Wada, Aogu, et al.
Publicado: (2025)
por: Wada, Aogu, et al.
Publicado: (2025)
Localizing Acoustic Energy in Sound Field Synthesis by Directionally Weighted Exterior Radiation Suppression
por: Tomita, Yoshihide, et al.
Publicado: (2024)
por: Tomita, Yoshihide, et al.
Publicado: (2024)
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
por: Fu, Ruibo, et al.
Publicado: (2024)
por: Fu, Ruibo, et al.
Publicado: (2024)
Facilitating Personalized TTS for Dysarthric Speakers Using Knowledge Anchoring and Curriculum Learning
por: Jeon, Yejin, et al.
Publicado: (2025)
por: Jeon, Yejin, et al.
Publicado: (2025)
Perturbation Self-Supervised Representations for Cross-Lingual Emotion TTS: Stage-Wise Modeling of Emotion and Speaker
por: Gong, Cheng, et al.
Publicado: (2025)
por: Gong, Cheng, et al.
Publicado: (2025)
Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs
por: He, Xinlu, et al.
Publicado: (2025)
por: He, Xinlu, et al.
Publicado: (2025)
Multi-Sampling-Frequency Naturalness MOS Prediction Using Self-Supervised Learning Model with Sampling-Frequency-Independent Layer
por: Nishikawa, Go, et al.
Publicado: (2025)
por: Nishikawa, Go, et al.
Publicado: (2025)
An Exploration of ECAPA-TDNN and x-vector Speaker Representations in Zero-shot Multi-speaker TTS
por: Kunešová, Marie, et al.
Publicado: (2025)
por: Kunešová, Marie, et al.
Publicado: (2025)
Sidon: Fast and Robust Open-Source Multilingual Speech Restoration for Large-scale Dataset Cleansing
por: Nakata, Wataru, et al.
Publicado: (2025)
por: Nakata, Wataru, et al.
Publicado: (2025)
Ejemplares similares
-
Active Learning for Text-to-Speech Synthesis with Informative Sample Collection
por: Seki, Kentaro, et al.
Publicado: (2025) -
J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling
por: Nakata, Wataru, et al.
Publicado: (2024) -
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics
por: Saeki, Takaaki, et al.
Publicado: (2024) -
SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark
por: Saito, Yuki, et al.
Publicado: (2024) -
Spatial-CLAP: Learning Spatially-Aware audio--text Embeddings for Multi-Source Conditions
por: Seki, Kentaro, et al.
Publicado: (2025)