Saved in:
| Main Author: | Rahman, Hanif |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.26978 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Strategies for improving low resource speech to text translation relying on pre-trained ASR models
by: Kesiraju, Santosh, et al.
Published: (2023)
by: Kesiraju, Santosh, et al.
Published: (2023)
Fine-tuning Whisper for Pashto ASR: strategies and scale
by: Rahman, Hanif
Published: (2026)
by: Rahman, Hanif
Published: (2026)
From Scarcity to Scale: A Release-Level Analysis of the Pashto Common Voice Dataset
by: Jahani, Jandad, et al.
Published: (2026)
by: Jahani, Jandad, et al.
Published: (2026)
Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation
by: Rahman, Hanif
Published: (2026)
by: Rahman, Hanif
Published: (2026)
Can we reconstruct a dysarthric voice with the large speech model Parler TTS?
by: Sanchez, Ariadna, et al.
Published: (2025)
by: Sanchez, Ariadna, et al.
Published: (2025)
The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data
by: Paraskevopoulos, Georgios, et al.
Published: (2024)
by: Paraskevopoulos, Georgios, et al.
Published: (2024)
An efficient text augmentation approach for contextualized Mandarin speech recognition
by: Zheng, Naijun, et al.
Published: (2024)
by: Zheng, Naijun, et al.
Published: (2024)
Transferable speech-to-text large language model alignment module
by: Wu, Boyong, et al.
Published: (2024)
by: Wu, Boyong, et al.
Published: (2024)
An experiment on an automated literature survey of data-driven speech enhancement methods
by: Santos, Arthur dos, et al.
Published: (2023)
by: Santos, Arthur dos, et al.
Published: (2023)
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
by: Lyth, Dan, et al.
Published: (2024)
by: Lyth, Dan, et al.
Published: (2024)
PashtoCorp: A 1.25-Billion-Word Corpus, Evaluation Suite, and Reproducible Pipeline for Low-Resource Language Development
by: Rahman, Hanif
Published: (2026)
by: Rahman, Hanif
Published: (2026)
Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech
by: Garg, Abhinav, et al.
Published: (2024)
by: Garg, Abhinav, et al.
Published: (2024)
EE-TTS: Emphatic Expressive TTS with Linguistic Information
by: Zhong, Yi, et al.
Published: (2023)
by: Zhong, Yi, et al.
Published: (2023)
Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
by: Fujita, Kenichi, et al.
Published: (2024)
by: Fujita, Kenichi, et al.
Published: (2024)
Pashto Common Voice: Building the First Open Speech Corpus for a 60-Million-Speaker Low-Resource Language
by: Rahman, Hanif, et al.
Published: (2026)
by: Rahman, Hanif, et al.
Published: (2026)
A unified front-end framework for English text-to-speech synthesis
by: Ying, Zelin, et al.
Published: (2023)
by: Ying, Zelin, et al.
Published: (2023)
MOSS-TTS Technical Report
by: Gong, Yitian, et al.
Published: (2026)
by: Gong, Yitian, et al.
Published: (2026)
TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch
by: Song, Xingchen, et al.
Published: (2024)
by: Song, Xingchen, et al.
Published: (2024)
Transfer the linguistic representations from TTS to accent conversion with non-parallel data
by: Chen, Xi, et al.
Published: (2024)
by: Chen, Xi, et al.
Published: (2024)
Training dynamic models using early exits for automatic speech recognition on resource-constrained devices
by: Wright, George August, et al.
Published: (2023)
by: Wright, George August, et al.
Published: (2023)
Tibetan-TTS:Low-Resource Tibetan Speech Synthesis with Large Model Adaptation
by: He, Jiaxu, et al.
Published: (2026)
by: He, Jiaxu, et al.
Published: (2026)
RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer
by: Matiyali, Neeraj, et al.
Published: (2025)
by: Matiyali, Neeraj, et al.
Published: (2025)
Moshi: a speech-text foundation model for real-time dialogue
by: Défossez, Alexandre, et al.
Published: (2024)
by: Défossez, Alexandre, et al.
Published: (2024)
A2TTS: TTS for Low Resource Indian Languages
by: Bhadoriya, Ayush Singh, et al.
Published: (2025)
by: Bhadoriya, Ayush Singh, et al.
Published: (2025)
Qwen3-TTS Technical Report
by: Hu, Hangrui, et al.
Published: (2026)
by: Hu, Hangrui, et al.
Published: (2026)
emg2speech: Synthesizing speech from electromyography using self-supervised speech models
by: Gowda, Harshavardhana T., et al.
Published: (2025)
by: Gowda, Harshavardhana T., et al.
Published: (2025)
Covertly improving intelligibility with data-driven adaptations of speech timing
by: Tuttösí, Paige, et al.
Published: (2026)
by: Tuttösí, Paige, et al.
Published: (2026)
Low-resource speech recognition and dialect identification of Irish in a multi-task framework
by: Lonergan, Liam, et al.
Published: (2024)
by: Lonergan, Liam, et al.
Published: (2024)
Improving child speech recognition with augmented child-like speech
by: Zhang, Yuanyuan, et al.
Published: (2024)
by: Zhang, Yuanyuan, et al.
Published: (2024)
Multi-interaction TTS toward professional recording reproduction
by: Kanagawa, Hiroki, et al.
Published: (2025)
by: Kanagawa, Hiroki, et al.
Published: (2025)
MunTTS: A Text-to-Speech System for Mundari
by: Gumma, Varun, et al.
Published: (2024)
by: Gumma, Varun, et al.
Published: (2024)
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
by: Feng, Xincan, et al.
Published: (2024)
by: Feng, Xincan, et al.
Published: (2024)
RWKVTTS: Yet another TTS based on RWKV-7
by: yueyu, Lin, et al.
Published: (2025)
by: yueyu, Lin, et al.
Published: (2025)
DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
by: Chen, Ziqi, et al.
Published: (2025)
by: Chen, Ziqi, et al.
Published: (2025)
The TTS-STT Flywheel: Synthetic Entity-Dense Audio Closes the Indic ASR Gap Where Commercial and Open-Source Systems Fail
by: Menta, Venkata Pushpak Teja
Published: (2026)
by: Menta, Venkata Pushpak Teja
Published: (2026)
Word-wise intonation model for cross-language TTS systems
by: A., Tomilov A., et al.
Published: (2024)
by: A., Tomilov A., et al.
Published: (2024)
An investigation of phrase break prediction in an End-to-End TTS system
by: Vadapalli, Anandaswarup
Published: (2023)
by: Vadapalli, Anandaswarup
Published: (2023)
JoyTTS: LLM-based Spoken Chatbot With Voice Cloning
by: Zhou, Fangru, et al.
Published: (2025)
by: Zhou, Fangru, et al.
Published: (2025)
A Language Modeling Approach to Diacritic-Free Hebrew TTS
by: Roth, Amit, et al.
Published: (2024)
by: Roth, Amit, et al.
Published: (2024)
SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level
by: Tee, Hitomi Jin Ling, et al.
Published: (2025)
by: Tee, Hitomi Jin Ling, et al.
Published: (2025)
Similar Items
-
Strategies for improving low resource speech to text translation relying on pre-trained ASR models
by: Kesiraju, Santosh, et al.
Published: (2023) -
Fine-tuning Whisper for Pashto ASR: strategies and scale
by: Rahman, Hanif
Published: (2026) -
From Scarcity to Scale: A Release-Level Analysis of the Pashto Common Voice Dataset
by: Jahani, Jandad, et al.
Published: (2026) -
Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation
by: Rahman, Hanif
Published: (2026) -
Can we reconstruct a dysarthric voice with the large speech model Parler TTS?
by: Sanchez, Ariadna, et al.
Published: (2025)