:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bandekar, Jesuraj, Udupa, Sathvik, Ghosh, Prasanta Kumar
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2505.00007
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Streaming Endpointer for Spoken Dialogue using Neural Audio Codecs and Label-Delayed Training
by: Udupa, Sathvik, et al.
Published: (2025)

A study on weakly-supervised training approaches for phoneme-level pronunciation scoring
by: Vidal, Jazmín, et al.
Published: (2026)

VAANI: Capturing the language landscape for an inclusive digital India
by: Pulikodan, Sujith, et al.
Published: (2026)

LLM-based phoneme-to-grapheme for phoneme-based speech recognition
by: Ma, Te, et al.
Published: (2025)

An approach to measuring the performance of Automatic Speech Recognition (ASR) models in the context of Large Language Model (LLM) powered applications
by: Pulikodan, Sujith, et al.
Published: (2025)

How phonemes contribute to deep speaker models?
by: Li, Pengqi, et al.
Published: (2024)

Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech
by: Garg, Abhinav, et al.
Published: (2024)

Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data
by: Shirahata, Yuma, et al.
Published: (2024)

BabAR: from phoneme recognition to developmental measures of young children's speech production
by: Lavechin, Marvin, et al.
Published: (2026)

A microscopic investigation of the effect of random envelope fluctuations on phoneme-in-noise perception
by: Osses, Alejandro, et al.
Published: (2024)

Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations
by: Zhou, Wangjin, et al.
Published: (2024)

Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition
by: Dong, Lukuang, et al.
Published: (2026)

Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words
by: Cuervo, Santiago, et al.
Published: (2021)

Role of the Pretraining and the Adaptation data sizes for low-resource real-time MRI video segmentation
by: Tholan, Masoud Thajudeen, et al.
Published: (2025)

Bottleneck Transformer-Based Approach for Improved Automatic STOI Score Prediction
by: Amartyaveer, et al.
Published: (2026)

Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection?
by: Dutta, Bikash, et al.
Published: (2025)

PRODIS -- a speech database and a phoneme-based language model for the study of predictability effects in Polish
by: Malisz, Zofia, et al.
Published: (2024)

Learning to Discover: A Generalized Framework for Raga Identification without Forgetting
by: Singh, Parampreet, et al.
Published: (2026)

Exploring the anatomy of articulation rate in spontaneous English speech: relationships between utterance length effects and social factors
by: Tanner, James, et al.
Published: (2024)

MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence
by: Kumar, Sonal, et al.
Published: (2025)

Unmasking real-world audio deepfakes: A data-centric approach
by: Combei, David, et al.
Published: (2025)

High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR
by: Banerjee, Sourav, et al.
Published: (2024)

Improving acoustic drone detection generalization through pretraining and data augmentation
by: Reuter, Paul M., et al.
Published: (2026)

PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification
by: Seth, Ashish, et al.
Published: (2024)

ProSE: Diffusion Priors for Speech Enhancement
by: Kumar, Sonal, et al.
Published: (2025)

Using RLHF to align speech enhancement approaches to mean-opinion quality scores
by: Kumar, Anurag, et al.
Published: (2024)

Deep, data-driven modeling of room acoustics: literature review and research perspectives
by: van Waterschoot, Toon
Published: (2025)

Leveraging LLMs for Scalable Non-intrusive Speech Quality Assessment
by: Cumlin, Fredrik, et al.
Published: (2025)

Complete reconstruction of the tongue contour through acoustic to articulatory inversion using real-time MRI data
by: Azzouz, Sofiane, et al.
Published: (2024)

Improving Stereo 3D Sound Event Localization and Detection: Perceptual Features, Stereo-specific Data Augmentation, and Distance Normalization
by: Yeow, Jun-Wei, et al.
Published: (2025)

Prompt-driven Target Speech Diarization
by: Jiang, Yidi, et al.
Published: (2023)

Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units
by: Yusuf, Bolaji, et al.
Published: (2024)

Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding
by: Li, Mohan, et al.
Published: (2024)

Discovering and Causally Validating Emotion-Sensitive Neurons in Large Audio-Language Models
by: Zhao, Xiutian, et al.
Published: (2026)

A two-stage transliteration approach to improve performance of a multilingual ASR
by: Kumar, Rohit
Published: (2024)

QiandaoEar22: A high quality noise dataset for identifying specific ship from multiple underwater acoustic targets using ship-radiated noise
by: Du, Xiaoyang, et al.
Published: (2024)

PiCoGen2: Piano cover generation with transfer learning approach and weakly aligned data
by: Tan, Chih-Pin, et al.
Published: (2024)

Synergistic Effects of Knowledge Distillation and Structured Pruning for Self-Supervised Speech Models
by: C, Shiva Kumar, et al.
Published: (2025)

Ultra-Low-Bitrate Mel-Spectrogram-based Neural Speech Coding with Flow-Matching-based Refinement and Vocoding-driven Reconstruction
by: Du, Hui-Peng, et al.
Published: (2026)

Data-driven Joint Detection and Localization of Acoustic Reflectors
by: Bicer, H. Nazim, et al.
Published: (2024)