Saved in:
| Main Authors: | Pandey, Isha, Gaikwad, Pranav, Parulekar, Amruta, Ramakrishnan, Ganesh |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.16875 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A2TTS: TTS for Low Resource Indian Languages
by: Bhadoriya, Ayush Singh, et al.
Published: (2025)
by: Bhadoriya, Ayush Singh, et al.
Published: (2025)
LASER: An LLM-based ASR Scoring and Evaluation Rubric
by: Parulekar, Amruta, et al.
Published: (2025)
by: Parulekar, Amruta, et al.
Published: (2025)
AMPS: ASR with Multimodal Paraphrase Supervision
by: Gupta, Abhishek, et al.
Published: (2024)
by: Gupta, Abhishek, et al.
Published: (2024)
Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR
by: Gupta, Abhishek, et al.
Published: (2024)
by: Gupta, Abhishek, et al.
Published: (2024)
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
by: Ko, Myeongjin, et al.
Published: (2023)
by: Ko, Myeongjin, et al.
Published: (2023)
Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
by: Arora, Akshit, et al.
Published: (2024)
by: Arora, Akshit, et al.
Published: (2024)
TTS-1 Technical Report
by: Atamanenko, Oleg, et al.
Published: (2025)
by: Atamanenko, Oleg, et al.
Published: (2025)
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
by: Gupta, Isha, et al.
Published: (2025)
by: Gupta, Isha, et al.
Published: (2025)
Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
by: Fujita, Kenichi, et al.
Published: (2024)
by: Fujita, Kenichi, et al.
Published: (2024)
Improving the Speaker Anonymization Evaluation's Robustness to Target Speakers with Adversarial Learning
by: Franzreb, Carlos, et al.
Published: (2025)
by: Franzreb, Carlos, et al.
Published: (2025)
Compact Neural TTS Voices for Accessibility
by: Jain, Kunal, et al.
Published: (2025)
by: Jain, Kunal, et al.
Published: (2025)
Language Modelling for Speaker Diarization in Telephonic Interviews
by: India, Miquel, et al.
Published: (2025)
by: India, Miquel, et al.
Published: (2025)
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge
by: Manku, Ruskin Raj, et al.
Published: (2025)
by: Manku, Ruskin Raj, et al.
Published: (2025)
Rethinking Speaker Embeddings for Speech Generation: Sub-Center Modeling for Capturing Intra-Speaker Diversity
by: Ulgen, Ismail Rasim, et al.
Published: (2024)
by: Ulgen, Ismail Rasim, et al.
Published: (2024)
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
by: Kawamura, Masaya, et al.
Published: (2024)
by: Kawamura, Masaya, et al.
Published: (2024)
Exploring speech style spaces with language models: Emotional TTS without emotion labels
by: Chandra, Shreeram Suresh, et al.
Published: (2024)
by: Chandra, Shreeram Suresh, et al.
Published: (2024)
Score-Based Training for Energy-Based TTS Models
by: Sun, Wanli, et al.
Published: (2025)
by: Sun, Wanli, et al.
Published: (2025)
Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification
by: Liao, Yen-Lun, et al.
Published: (2022)
by: Liao, Yen-Lun, et al.
Published: (2022)
Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting
by: Park, Hyun Jin, et al.
Published: (2024)
by: Park, Hyun Jin, et al.
Published: (2024)
HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System
by: Zhang, Zhisheng, et al.
Published: (2024)
by: Zhang, Zhisheng, et al.
Published: (2024)
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
by: Wang, Quan, et al.
Published: (2024)
by: Wang, Quan, et al.
Published: (2024)
TSELM: Target Speaker Extraction using Discrete Tokens and Language Models
by: Tang, Beilong, et al.
Published: (2024)
by: Tang, Beilong, et al.
Published: (2024)
UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
by: Glazer, Neta, et al.
Published: (2025)
by: Glazer, Neta, et al.
Published: (2025)
SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System
by: Kim, Hyeongju, et al.
Published: (2025)
by: Kim, Hyeongju, et al.
Published: (2025)
Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model
by: Park, Hyun Jin, et al.
Published: (2024)
by: Park, Hyun Jin, et al.
Published: (2024)
T5Gemma-TTS Technical Report
by: Arata, Chihiro, et al.
Published: (2026)
by: Arata, Chihiro, et al.
Published: (2026)
Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification
by: Lepage, Theo, et al.
Published: (2023)
by: Lepage, Theo, et al.
Published: (2023)
EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
by: Liang, Ziqi, et al.
Published: (2024)
by: Liang, Ziqi, et al.
Published: (2024)
Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies
by: Anand, Srija, et al.
Published: (2024)
by: Anand, Srija, et al.
Published: (2024)
Source -Free Domain Adaptation for Speaker Verification in Data-Scarce Languages and Noisy Channels
by: Elia, Shlomo Salo, et al.
Published: (2024)
by: Elia, Shlomo Salo, et al.
Published: (2024)
Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
by: Jeon, Yejin, et al.
Published: (2024)
by: Jeon, Yejin, et al.
Published: (2024)
Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization
by: Cai, Zexin, et al.
Published: (2024)
by: Cai, Zexin, et al.
Published: (2024)
Training Universal Vocoders with Feature Smoothing-Based Augmentation Methods for High-Quality TTS Systems
by: Liu, Jeongmin, et al.
Published: (2024)
by: Liu, Jeongmin, et al.
Published: (2024)
Robustness of Speech Separation Models for Similar-pitch Speakers
by: Lay, Bunlong, et al.
Published: (2024)
by: Lay, Bunlong, et al.
Published: (2024)
Multi-Stage Speaker Diarization for Noisy Classrooms
by: Khan, Ali Sartaz, et al.
Published: (2025)
by: Khan, Ali Sartaz, et al.
Published: (2025)
Adversarial Data Augmentation for Robust Speaker Verification
by: Zhou, Zhenyu, et al.
Published: (2024)
by: Zhou, Zhenyu, et al.
Published: (2024)
Investigating Confidence Estimation Measures for Speaker Diarization
by: Chowdhury, Anurag, et al.
Published: (2024)
by: Chowdhury, Anurag, et al.
Published: (2024)
Cosine Scoring with Uncertainty for Neural Speaker Embedding
by: Wang, Qiongqiong, et al.
Published: (2024)
by: Wang, Qiongqiong, et al.
Published: (2024)
Text-to-Speech for Unseen Speakers via Low-Complexity Discrete Unit-Based Frame Selection
by: Ulgen, Ismail Rasim, et al.
Published: (2024)
by: Ulgen, Ismail Rasim, et al.
Published: (2024)
Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization
by: Hussain, Shehzeen, et al.
Published: (2025)
by: Hussain, Shehzeen, et al.
Published: (2025)
Similar Items
-
A2TTS: TTS for Low Resource Indian Languages
by: Bhadoriya, Ayush Singh, et al.
Published: (2025) -
LASER: An LLM-based ASR Scoring and Evaluation Rubric
by: Parulekar, Amruta, et al.
Published: (2025) -
AMPS: ASR with Multimodal Paraphrase Supervision
by: Gupta, Abhishek, et al.
Published: (2024) -
Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR
by: Gupta, Abhishek, et al.
Published: (2024) -
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
by: Ko, Myeongjin, et al.
Published: (2023)