Saved in:
| Main Authors: | Gogoi, Parismita, Kalita, Sishir, Lalhminghlui, Wendy, Terhiija, Viyazonuo, Tzudir, Moakala, Sarmah, Priyankoo, Prasanna, S. R. M. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.03606 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exploring rhythm formant analysis for Indic language classification
by: Gogoi, Parismita, et al.
Published: (2024)
by: Gogoi, Parismita, et al.
Published: (2024)
Analyzing long-term rhythm variations in Mising and Assamese using frequency domain correlates
by: Gogoi, Parismita, et al.
Published: (2024)
by: Gogoi, Parismita, et al.
Published: (2024)
Towards Prosodically Informed Mizo TTS without Explicit Tone Markings
by: Mohanta, Abhijit, et al.
Published: (2026)
by: Mohanta, Abhijit, et al.
Published: (2026)
Leveraging AM and FM Rhythm Spectrograms for Dementia Classification and Assessment
by: Gogoi, Parismita, et al.
Published: (2025)
by: Gogoi, Parismita, et al.
Published: (2025)
Cross-Linguistic Rhythmic and Spectral Feature-Based Analysis of Nyishi and Adi: Two Under-Resourced Languages of Arunachal Pradesh
by: Gogoi, Deepshikha, et al.
Published: (2026)
by: Gogoi, Deepshikha, et al.
Published: (2026)
How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-resource Transfer
by: Kim, Minu, et al.
Published: (2025)
by: Kim, Minu, et al.
Published: (2025)
Transcribe, Align and Segment: Creating speech datasets for low-resource languages
by: Sereda, Taras
Published: (2024)
by: Sereda, Taras
Published: (2024)
Automated evaluation of children's speech fluency for low-resource languages
by: Zhang, Bowen, et al.
Published: (2025)
by: Zhang, Bowen, et al.
Published: (2025)
Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens
by: San, Nay, et al.
Published: (2024)
by: San, Nay, et al.
Published: (2024)
Phoneme-based speech recognition driven by large language models and sampling marginalization
by: Ma, Te, et al.
Published: (2025)
by: Ma, Te, et al.
Published: (2025)
The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data
by: Paraskevopoulos, Georgios, et al.
Published: (2024)
by: Paraskevopoulos, Georgios, et al.
Published: (2024)
Prominence-aware automatic speech recognition for conversational speech
by: Linke, Julian, et al.
Published: (2025)
by: Linke, Julian, et al.
Published: (2025)
End-to-end transfer learning for speaker-independent cross-language and cross-corpus speech emotion recognition
by: Tang, Duowei, et al.
Published: (2023)
by: Tang, Duowei, et al.
Published: (2023)
Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
by: Dhakal, Manish, et al.
Published: (2024)
by: Dhakal, Manish, et al.
Published: (2024)
Introduction to speech recognition
by: Dauphin, Gabriel
Published: (2024)
by: Dauphin, Gabriel
Published: (2024)
Training dynamic models using early exits for automatic speech recognition on resource-constrained devices
by: Wright, George August, et al.
Published: (2023)
by: Wright, George August, et al.
Published: (2023)
Improving child speech recognition with augmented child-like speech
by: Zhang, Yuanyuan, et al.
Published: (2024)
by: Zhang, Yuanyuan, et al.
Published: (2024)
Low-resource speech recognition and dialect identification of Irish in a multi-task framework
by: Lonergan, Liam, et al.
Published: (2024)
by: Lonergan, Liam, et al.
Published: (2024)
Fusion of Modulation Spectrogram and SSL with Multi-head Attention for Fake Speech Detection
by: N, Rishith Sadashiv T, et al.
Published: (2025)
by: N, Rishith Sadashiv T, et al.
Published: (2025)
IIITH-BUT system for IWSLT 2025 low-resource Bhojpuri to Hindi speech translation
by: Akkiraju, Bhavana, et al.
Published: (2025)
by: Akkiraju, Bhavana, et al.
Published: (2025)
Robust fine-tuning of speech recognition models via model merging: application to disordered speech
by: Ducorroy, Alexandre, et al.
Published: (2025)
by: Ducorroy, Alexandre, et al.
Published: (2025)
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
by: Vecino, Biel Tura, et al.
Published: (2025)
by: Vecino, Biel Tura, et al.
Published: (2025)
SLM-S2ST: A multimodal language model for direct speech-to-speech translation
by: Hu, Yuxuan, et al.
Published: (2025)
by: Hu, Yuxuan, et al.
Published: (2025)
Heterogeneous bimodal attention fusion for speech emotion recognition
by: Luo, Jiachen, et al.
Published: (2025)
by: Luo, Jiachen, et al.
Published: (2025)
Teaching the Teachers: Boosting unsupervised domain adaptation in speech recognition by ensemble update
by: Ahmad, Rehan, et al.
Published: (2026)
by: Ahmad, Rehan, et al.
Published: (2026)
Joint decoding method for controllable contextual speech recognition based on Speech LLM
by: Fang, Yangui, et al.
Published: (2025)
by: Fang, Yangui, et al.
Published: (2025)
Graph-based multi-Feature fusion method for speech emotion recognition
by: Liu, Xueyu, et al.
Published: (2024)
by: Liu, Xueyu, et al.
Published: (2024)
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
by: Gong, Rong, et al.
Published: (2024)
by: Gong, Rong, et al.
Published: (2024)
Strategies for improving low resource speech to text translation relying on pre-trained ASR models
by: Kesiraju, Santosh, et al.
Published: (2023)
by: Kesiraju, Santosh, et al.
Published: (2023)
Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition
by: Dong, Lukuang, et al.
Published: (2026)
by: Dong, Lukuang, et al.
Published: (2026)
Zipformer: A faster and better encoder for automatic speech recognition
by: Yao, Zengwei, et al.
Published: (2023)
by: Yao, Zengwei, et al.
Published: (2023)
Self-consistent context aware conformer transducer for speech recognition
by: Kolokolov, Konstantin, et al.
Published: (2024)
by: Kolokolov, Konstantin, et al.
Published: (2024)
LLM-based phoneme-to-grapheme for phoneme-based speech recognition
by: Ma, Te, et al.
Published: (2025)
by: Ma, Te, et al.
Published: (2025)
Enhancing CTC-based speech recognition with diverse modeling units
by: Han, Shiyi, et al.
Published: (2024)
by: Han, Shiyi, et al.
Published: (2024)
An efficient text augmentation approach for contextualized Mandarin speech recognition
by: Zheng, Naijun, et al.
Published: (2024)
by: Zheng, Naijun, et al.
Published: (2024)
CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024)
by: Yao, Zengwei, et al.
Published: (2024)
Robustifying automatic speech recognition by extracting slowly varying features
by: Pizarro, Matías, et al.
Published: (2021)
by: Pizarro, Matías, et al.
Published: (2021)
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
by: Maiti, Soumi, et al.
Published: (2023)
by: Maiti, Soumi, et al.
Published: (2023)
BabAR: from phoneme recognition to developmental measures of young children's speech production
by: Lavechin, Marvin, et al.
Published: (2026)
by: Lavechin, Marvin, et al.
Published: (2026)
Similar Items
-
Exploring rhythm formant analysis for Indic language classification
by: Gogoi, Parismita, et al.
Published: (2024) -
Analyzing long-term rhythm variations in Mising and Assamese using frequency domain correlates
by: Gogoi, Parismita, et al.
Published: (2024) -
Towards Prosodically Informed Mizo TTS without Explicit Tone Markings
by: Mohanta, Abhijit, et al.
Published: (2026) -
Leveraging AM and FM Rhythm Spectrograms for Dementia Classification and Assessment
by: Gogoi, Parismita, et al.
Published: (2025) -
Cross-Linguistic Rhythmic and Spectral Feature-Based Analysis of Nyishi and Adi: Two Under-Resourced Languages of Arunachal Pradesh
by: Gogoi, Deepshikha, et al.
Published: (2026)