Saved in:
| Main Authors: | Bandekar, Jesuraj, Udupa, Sathvik, Ghosh, Prasanta Kumar |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.00007 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Streaming Endpointer for Spoken Dialogue using Neural Audio Codecs and Label-Delayed Training
by: Udupa, Sathvik, et al.
Published: (2025)
by: Udupa, Sathvik, et al.
Published: (2025)
A study on weakly-supervised training approaches for phoneme-level pronunciation scoring
by: Vidal, Jazmín, et al.
Published: (2026)
by: Vidal, Jazmín, et al.
Published: (2026)
VAANI: Capturing the language landscape for an inclusive digital India
by: Pulikodan, Sujith, et al.
Published: (2026)
by: Pulikodan, Sujith, et al.
Published: (2026)
LLM-based phoneme-to-grapheme for phoneme-based speech recognition
by: Ma, Te, et al.
Published: (2025)
by: Ma, Te, et al.
Published: (2025)
An approach to measuring the performance of Automatic Speech Recognition (ASR) models in the context of Large Language Model (LLM) powered applications
by: Pulikodan, Sujith, et al.
Published: (2025)
by: Pulikodan, Sujith, et al.
Published: (2025)
How phonemes contribute to deep speaker models?
by: Li, Pengqi, et al.
Published: (2024)
by: Li, Pengqi, et al.
Published: (2024)
Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech
by: Garg, Abhinav, et al.
Published: (2024)
by: Garg, Abhinav, et al.
Published: (2024)
Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data
by: Shirahata, Yuma, et al.
Published: (2024)
by: Shirahata, Yuma, et al.
Published: (2024)
BabAR: from phoneme recognition to developmental measures of young children's speech production
by: Lavechin, Marvin, et al.
Published: (2026)
by: Lavechin, Marvin, et al.
Published: (2026)
A microscopic investigation of the effect of random envelope fluctuations on phoneme-in-noise perception
by: Osses, Alejandro, et al.
Published: (2024)
by: Osses, Alejandro, et al.
Published: (2024)
Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations
by: Zhou, Wangjin, et al.
Published: (2024)
by: Zhou, Wangjin, et al.
Published: (2024)
Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition
by: Dong, Lukuang, et al.
Published: (2026)
by: Dong, Lukuang, et al.
Published: (2026)
Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words
by: Cuervo, Santiago, et al.
Published: (2021)
by: Cuervo, Santiago, et al.
Published: (2021)
Role of the Pretraining and the Adaptation data sizes for low-resource real-time MRI video segmentation
by: Tholan, Masoud Thajudeen, et al.
Published: (2025)
by: Tholan, Masoud Thajudeen, et al.
Published: (2025)
Bottleneck Transformer-Based Approach for Improved Automatic STOI Score Prediction
by: Amartyaveer, et al.
Published: (2026)
by: Amartyaveer, et al.
Published: (2026)
Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection?
by: Dutta, Bikash, et al.
Published: (2025)
by: Dutta, Bikash, et al.
Published: (2025)
PRODIS -- a speech database and a phoneme-based language model for the study of predictability effects in Polish
by: Malisz, Zofia, et al.
Published: (2024)
by: Malisz, Zofia, et al.
Published: (2024)
Learning to Discover: A Generalized Framework for Raga Identification without Forgetting
by: Singh, Parampreet, et al.
Published: (2026)
by: Singh, Parampreet, et al.
Published: (2026)
Exploring the anatomy of articulation rate in spontaneous English speech: relationships between utterance length effects and social factors
by: Tanner, James, et al.
Published: (2024)
by: Tanner, James, et al.
Published: (2024)
MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence
by: Kumar, Sonal, et al.
Published: (2025)
by: Kumar, Sonal, et al.
Published: (2025)
Unmasking real-world audio deepfakes: A data-centric approach
by: Combei, David, et al.
Published: (2025)
by: Combei, David, et al.
Published: (2025)
High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR
by: Banerjee, Sourav, et al.
Published: (2024)
by: Banerjee, Sourav, et al.
Published: (2024)
Improving acoustic drone detection generalization through pretraining and data augmentation
by: Reuter, Paul M., et al.
Published: (2026)
by: Reuter, Paul M., et al.
Published: (2026)
PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification
by: Seth, Ashish, et al.
Published: (2024)
by: Seth, Ashish, et al.
Published: (2024)
ProSE: Diffusion Priors for Speech Enhancement
by: Kumar, Sonal, et al.
Published: (2025)
by: Kumar, Sonal, et al.
Published: (2025)
Using RLHF to align speech enhancement approaches to mean-opinion quality scores
by: Kumar, Anurag, et al.
Published: (2024)
by: Kumar, Anurag, et al.
Published: (2024)
Deep, data-driven modeling of room acoustics: literature review and research perspectives
by: van Waterschoot, Toon
Published: (2025)
by: van Waterschoot, Toon
Published: (2025)
Leveraging LLMs for Scalable Non-intrusive Speech Quality Assessment
by: Cumlin, Fredrik, et al.
Published: (2025)
by: Cumlin, Fredrik, et al.
Published: (2025)
Complete reconstruction of the tongue contour through acoustic to articulatory inversion using real-time MRI data
by: Azzouz, Sofiane, et al.
Published: (2024)
by: Azzouz, Sofiane, et al.
Published: (2024)
Improving Stereo 3D Sound Event Localization and Detection: Perceptual Features, Stereo-specific Data Augmentation, and Distance Normalization
by: Yeow, Jun-Wei, et al.
Published: (2025)
by: Yeow, Jun-Wei, et al.
Published: (2025)
Prompt-driven Target Speech Diarization
by: Jiang, Yidi, et al.
Published: (2023)
by: Jiang, Yidi, et al.
Published: (2023)
Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units
by: Yusuf, Bolaji, et al.
Published: (2024)
by: Yusuf, Bolaji, et al.
Published: (2024)
Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding
by: Li, Mohan, et al.
Published: (2024)
by: Li, Mohan, et al.
Published: (2024)
Discovering and Causally Validating Emotion-Sensitive Neurons in Large Audio-Language Models
by: Zhao, Xiutian, et al.
Published: (2026)
by: Zhao, Xiutian, et al.
Published: (2026)
A two-stage transliteration approach to improve performance of a multilingual ASR
by: Kumar, Rohit
Published: (2024)
by: Kumar, Rohit
Published: (2024)
QiandaoEar22: A high quality noise dataset for identifying specific ship from multiple underwater acoustic targets using ship-radiated noise
by: Du, Xiaoyang, et al.
Published: (2024)
by: Du, Xiaoyang, et al.
Published: (2024)
PiCoGen2: Piano cover generation with transfer learning approach and weakly aligned data
by: Tan, Chih-Pin, et al.
Published: (2024)
by: Tan, Chih-Pin, et al.
Published: (2024)
Synergistic Effects of Knowledge Distillation and Structured Pruning for Self-Supervised Speech Models
by: C, Shiva Kumar, et al.
Published: (2025)
by: C, Shiva Kumar, et al.
Published: (2025)
Ultra-Low-Bitrate Mel-Spectrogram-based Neural Speech Coding with Flow-Matching-based Refinement and Vocoding-driven Reconstruction
by: Du, Hui-Peng, et al.
Published: (2026)
by: Du, Hui-Peng, et al.
Published: (2026)
Data-driven Joint Detection and Localization of Acoustic Reflectors
by: Bicer, H. Nazim, et al.
Published: (2024)
by: Bicer, H. Nazim, et al.
Published: (2024)
Similar Items
-
Streaming Endpointer for Spoken Dialogue using Neural Audio Codecs and Label-Delayed Training
by: Udupa, Sathvik, et al.
Published: (2025) -
A study on weakly-supervised training approaches for phoneme-level pronunciation scoring
by: Vidal, Jazmín, et al.
Published: (2026) -
VAANI: Capturing the language landscape for an inclusive digital India
by: Pulikodan, Sujith, et al.
Published: (2026) -
LLM-based phoneme-to-grapheme for phoneme-based speech recognition
by: Ma, Te, et al.
Published: (2025) -
An approach to measuring the performance of Automatic Speech Recognition (ASR) models in the context of Large Language Model (LLM) powered applications
by: Pulikodan, Sujith, et al.
Published: (2025)