:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Rahman, Hanif
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Sound
Online Access:	https://arxiv.org/abs/2605.26978
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Strategies for improving low resource speech to text translation relying on pre-trained ASR models
by: Kesiraju, Santosh, et al.
Published: (2023)

Fine-tuning Whisper for Pashto ASR: strategies and scale
by: Rahman, Hanif
Published: (2026)

From Scarcity to Scale: A Release-Level Analysis of the Pashto Common Voice Dataset
by: Jahani, Jandad, et al.
Published: (2026)

Benchmarking Multilingual Speech Models on Pashto: Zero-Shot ASR, Script Failure, and Cross-Domain Evaluation
by: Rahman, Hanif
Published: (2026)

Can we reconstruct a dysarthric voice with the large speech model Parler TTS?
by: Sanchez, Ariadna, et al.
Published: (2025)

The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data
by: Paraskevopoulos, Georgios, et al.
Published: (2024)

An efficient text augmentation approach for contextualized Mandarin speech recognition
by: Zheng, Naijun, et al.
Published: (2024)

Transferable speech-to-text large language model alignment module
by: Wu, Boyong, et al.
Published: (2024)

An experiment on an automated literature survey of data-driven speech enhancement methods
by: Santos, Arthur dos, et al.
Published: (2023)

Natural language guidance of high-fidelity text-to-speech with synthetic annotations
by: Lyth, Dan, et al.
Published: (2024)

PashtoCorp: A 1.25-Billion-Word Corpus, Evaluation Suite, and Reproducible Pipeline for Low-Resource Language Development
by: Rahman, Hanif
Published: (2026)

Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech
by: Garg, Abhinav, et al.
Published: (2024)

EE-TTS: Emphatic Expressive TTS with Linguistic Information
by: Zhong, Yi, et al.
Published: (2023)

Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
by: Fujita, Kenichi, et al.
Published: (2024)

Pashto Common Voice: Building the First Open Speech Corpus for a 60-Million-Speaker Low-Resource Language
by: Rahman, Hanif, et al.
Published: (2026)

A unified front-end framework for English text-to-speech synthesis
by: Ying, Zelin, et al.
Published: (2023)

MOSS-TTS Technical Report
by: Gong, Yitian, et al.
Published: (2026)

TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch
by: Song, Xingchen, et al.
Published: (2024)

Transfer the linguistic representations from TTS to accent conversion with non-parallel data
by: Chen, Xi, et al.
Published: (2024)

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices
by: Wright, George August, et al.
Published: (2023)

Tibetan-TTS:Low-Resource Tibetan Speech Synthesis with Large Model Adaptation
by: He, Jiaxu, et al.
Published: (2026)

RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer
by: Matiyali, Neeraj, et al.
Published: (2025)

Moshi: a speech-text foundation model for real-time dialogue
by: Défossez, Alexandre, et al.
Published: (2024)

A2TTS: TTS for Low Resource Indian Languages
by: Bhadoriya, Ayush Singh, et al.
Published: (2025)

Qwen3-TTS Technical Report
by: Hu, Hangrui, et al.
Published: (2026)

emg2speech: Synthesizing speech from electromyography using self-supervised speech models
by: Gowda, Harshavardhana T., et al.
Published: (2025)

Covertly improving intelligibility with data-driven adaptations of speech timing
by: Tuttösí, Paige, et al.
Published: (2026)

Low-resource speech recognition and dialect identification of Irish in a multi-task framework
by: Lonergan, Liam, et al.
Published: (2024)

Improving child speech recognition with augmented child-like speech
by: Zhang, Yuanyuan, et al.
Published: (2024)

Multi-interaction TTS toward professional recording reproduction
by: Kanagawa, Hiroki, et al.
Published: (2025)

MunTTS: A Text-to-Speech System for Mundari
by: Gumma, Varun, et al.
Published: (2024)

Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
by: Feng, Xincan, et al.
Published: (2024)

RWKVTTS: Yet another TTS based on RWKV-7
by: yueyu, Lin, et al.
Published: (2025)

DiaMoE-TTS: A Unified IPA-Based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation
by: Chen, Ziqi, et al.
Published: (2025)

The TTS-STT Flywheel: Synthetic Entity-Dense Audio Closes the Indic ASR Gap Where Commercial and Open-Source Systems Fail
by: Menta, Venkata Pushpak Teja
Published: (2026)

Word-wise intonation model for cross-language TTS systems
by: A., Tomilov A., et al.
Published: (2024)

An investigation of phrase break prediction in an End-to-End TTS system
by: Vadapalli, Anandaswarup
Published: (2023)

JoyTTS: LLM-based Spoken Chatbot With Voice Cloning
by: Zhou, Fangru, et al.
Published: (2025)

A Language Modeling Approach to Diacritic-Free Hebrew TTS
by: Roth, Amit, et al.
Published: (2024)

SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level
by: Tee, Hitomi Jin Ling, et al.
Published: (2025)