:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gogoi, Parismita, Kalita, Sishir, Lalhminghlui, Wendy, Terhiija, Viyazonuo, Tzudir, Moakala, Sarmah, Priyankoo, Prasanna, S. R. M.
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Artificial Intelligence Computation and Language Signal Processing
Online Access:	https://arxiv.org/abs/2506.03606
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exploring rhythm formant analysis for Indic language classification
by: Gogoi, Parismita, et al.
Published: (2024)

Analyzing long-term rhythm variations in Mising and Assamese using frequency domain correlates
by: Gogoi, Parismita, et al.
Published: (2024)

Towards Prosodically Informed Mizo TTS without Explicit Tone Markings
by: Mohanta, Abhijit, et al.
Published: (2026)

Leveraging AM and FM Rhythm Spectrograms for Dementia Classification and Assessment
by: Gogoi, Parismita, et al.
Published: (2025)

Cross-Linguistic Rhythmic and Spectral Feature-Based Analysis of Nyishi and Adi: Two Under-Resourced Languages of Arunachal Pradesh
by: Gogoi, Deepshikha, et al.
Published: (2026)

How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
by: Kim, Minu, et al.
Published: (2025)

Transcribe, Align and Segment: Creating speech datasets for low-resource languages
by: Sereda, Taras
Published: (2024)

Automated evaluation of children's speech fluency for low-resource languages
by: Zhang, Bowen, et al.
Published: (2025)

Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens
by: San, Nay, et al.
Published: (2024)

Phoneme-based speech recognition driven by large language models and sampling marginalization
by: Ma, Te, et al.
Published: (2025)

The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data
by: Paraskevopoulos, Georgios, et al.
Published: (2024)

Prominence-aware automatic speech recognition for conversational speech
by: Linke, Julian, et al.
Published: (2025)

End-to-end transfer learning for speaker-independent cross-language and cross-corpus speech emotion recognition
by: Tang, Duowei, et al.
Published: (2023)

Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2024)

Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
by: Dhakal, Manish, et al.
Published: (2024)

Introduction to speech recognition
by: Dauphin, Gabriel
Published: (2024)

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices
by: Wright, George August, et al.
Published: (2023)

Improving child speech recognition with augmented child-like speech
by: Zhang, Yuanyuan, et al.
Published: (2024)

Low-resource speech recognition and dialect identification of Irish in a multi-task framework
by: Lonergan, Liam, et al.
Published: (2024)

Fusion of Modulation Spectrogram and SSL with Multi-head Attention for Fake Speech Detection
by: N, Rishith Sadashiv T, et al.
Published: (2025)

IIITH-BUT system for IWSLT 2025 low-resource Bhojpuri to Hindi speech translation
by: Akkiraju, Bhavana, et al.
Published: (2025)

Robust fine-tuning of speech recognition models via model merging: application to disordered speech
by: Ducorroy, Alexandre, et al.
Published: (2025)

Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
by: Vecino, Biel Tura, et al.
Published: (2025)

SLM-S2ST: A multimodal language model for direct speech-to-speech translation
by: Hu, Yuxuan, et al.
Published: (2025)

Heterogeneous bimodal attention fusion for speech emotion recognition
by: Luo, Jiachen, et al.
Published: (2025)

Teaching the Teachers: Boosting unsupervised domain adaptation in speech recognition by ensemble update
by: Ahmad, Rehan, et al.
Published: (2026)

Joint decoding method for controllable contextual speech recognition based on Speech LLM
by: Fang, Yangui, et al.
Published: (2025)

Graph-based multi-Feature fusion method for speech emotion recognition
by: Liu, Xueyu, et al.
Published: (2024)

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
by: Gong, Rong, et al.
Published: (2024)

Strategies for improving low resource speech to text translation relying on pre-trained ASR models
by: Kesiraju, Santosh, et al.
Published: (2023)

Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition
by: Dong, Lukuang, et al.
Published: (2026)

Zipformer: A faster and better encoder for automatic speech recognition
by: Yao, Zengwei, et al.
Published: (2023)

Self-consistent context aware conformer transducer for speech recognition
by: Kolokolov, Konstantin, et al.
Published: (2024)

LLM-based phoneme-to-grapheme for phoneme-based speech recognition
by: Ma, Te, et al.
Published: (2025)

Enhancing CTC-based speech recognition with diverse modeling units
by: Han, Shiyi, et al.
Published: (2024)

An efficient text augmentation approach for contextualized Mandarin speech recognition
by: Zheng, Naijun, et al.
Published: (2024)

CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024)

Robustifying automatic speech recognition by extracting slowly varying features
by: Pizarro, Matías, et al.
Published: (2021)

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
by: Maiti, Soumi, et al.
Published: (2023)

BabAR: from phoneme recognition to developmental measures of young children's speech production
by: Lavechin, Marvin, et al.
Published: (2026)