:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pandey, Isha, Gaikwad, Pranav, Parulekar, Amruta, Ramakrishnan, Ganesh
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Machine Learning
Online Access:	https://arxiv.org/abs/2507.16875
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A2TTS: TTS for Low Resource Indian Languages
by: Bhadoriya, Ayush Singh, et al.
Published: (2025)

LASER: An LLM-based ASR Scoring and Evaluation Rubric
by: Parulekar, Amruta, et al.
Published: (2025)

AMPS: ASR with Multimodal Paraphrase Supervision
by: Gupta, Abhishek, et al.
Published: (2024)

Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR
by: Gupta, Abhishek, et al.
Published: (2024)

Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
by: Ko, Myeongjin, et al.
Published: (2023)

Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages
by: Arora, Akshit, et al.
Published: (2024)

TTS-1 Technical Report
by: Atamanenko, Oleg, et al.
Published: (2025)

"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
by: Gupta, Isha, et al.
Published: (2025)

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
by: Fujita, Kenichi, et al.
Published: (2024)

Improving the Speaker Anonymization Evaluation's Robustness to Target Speakers with Adversarial Learning
by: Franzreb, Carlos, et al.
Published: (2025)

Compact Neural TTS Voices for Accessibility
by: Jain, Kunal, et al.
Published: (2025)

Language Modelling for Speaker Diarization in Telephonic Interviews
by: India, Miquel, et al.
Published: (2025)

EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge
by: Manku, Ruskin Raj, et al.
Published: (2025)

Rethinking Speaker Embeddings for Speech Generation: Sub-Center Modeling for Capturing Intra-Speaker Diversity
by: Ulgen, Ismail Rasim, et al.
Published: (2024)

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
by: Kawamura, Masaya, et al.
Published: (2024)

Exploring speech style spaces with language models: Emotional TTS without emotion labels
by: Chandra, Shreeram Suresh, et al.
Published: (2024)

Score-Based Training for Energy-Based TTS Models
by: Sun, Wanli, et al.
Published: (2025)

Adversarial Speaker Distillation for Countermeasure Model on Automatic Speaker Verification
by: Liao, Yen-Lun, et al.
Published: (2022)

Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting
by: Park, Hyun Jin, et al.
Published: (2024)

HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System
by: Zhang, Zhisheng, et al.
Published: (2024)

DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
by: Wang, Quan, et al.
Published: (2024)

TSELM: Target Speaker Extraction using Discrete Tokens and Language Models
by: Tang, Beilong, et al.
Published: (2024)

UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
by: Glazer, Neta, et al.
Published: (2025)

SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System
by: Kim, Hyeongju, et al.
Published: (2025)

Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model
by: Park, Hyun Jin, et al.
Published: (2024)

T5Gemma-TTS Technical Report
by: Arata, Chihiro, et al.
Published: (2026)

Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification
by: Lepage, Theo, et al.
Published: (2023)

EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
by: Liang, Ziqi, et al.
Published: (2024)

Enhancing Out-of-Vocabulary Performance of Indian TTS Systems for Practical Applications through Low-Effort Data Strategies
by: Anand, Srija, et al.
Published: (2024)

Source -Free Domain Adaptation for Speaker Verification in Data-Scarce Languages and Noisy Channels
by: Elia, Shlomo Salo, et al.
Published: (2024)

Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
by: Jeon, Yejin, et al.
Published: (2024)

Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization
by: Cai, Zexin, et al.
Published: (2024)

Training Universal Vocoders with Feature Smoothing-Based Augmentation Methods for High-Quality TTS Systems
by: Liu, Jeongmin, et al.
Published: (2024)

Robustness of Speech Separation Models for Similar-pitch Speakers
by: Lay, Bunlong, et al.
Published: (2024)

Multi-Stage Speaker Diarization for Noisy Classrooms
by: Khan, Ali Sartaz, et al.
Published: (2025)

Adversarial Data Augmentation for Robust Speaker Verification
by: Zhou, Zhenyu, et al.
Published: (2024)

Investigating Confidence Estimation Measures for Speaker Diarization
by: Chowdhury, Anurag, et al.
Published: (2024)

Cosine Scoring with Uncertainty for Neural Speaker Embedding
by: Wang, Qiongqiong, et al.
Published: (2024)

Text-to-Speech for Unseen Speakers via Low-Complexity Discrete Unit-Based Frame Selection
by: Ulgen, Ismail Rasim, et al.
Published: (2024)

Align2Speak: Improving TTS for Low Resource Languages via ASR-Guided Online Preference Optimization
by: Hussain, Shehzeen, et al.
Published: (2025)