Saved in:
| Main Authors: | Wang, Shenran, Yang, Changbing, Parkhill, Mike, Quinn, Chad, Hammerly, Christopher, Zhu, Jian |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.02703 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language
by: Zhu, Jian, et al.
Published: (2023)
by: Zhu, Jian, et al.
Published: (2023)
Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition
by: Dong, Lukuang, et al.
Published: (2026)
by: Dong, Lukuang, et al.
Published: (2026)
ZIPA: A family of efficient models for multilingual phone recognition
by: Zhu, Jian, et al.
Published: (2025)
by: Zhu, Jian, et al.
Published: (2025)
A dual task learning approach to fine-tune a multilingual semantic speech encoder for Spoken Language Understanding
by: Laperrière, Gaëlle, et al.
Published: (2024)
by: Laperrière, Gaëlle, et al.
Published: (2024)
emg2speech: Synthesizing speech from electromyography using self-supervised speech models
by: Gowda, Harshavardhana T., et al.
Published: (2025)
by: Gowda, Harshavardhana T., et al.
Published: (2025)
Improving child speech recognition with augmented child-like speech
by: Zhang, Yuanyuan, et al.
Published: (2024)
by: Zhang, Yuanyuan, et al.
Published: (2024)
Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
by: Fujita, Kenichi, et al.
Published: (2024)
by: Fujita, Kenichi, et al.
Published: (2024)
A unified front-end framework for English text-to-speech synthesis
by: Ying, Zelin, et al.
Published: (2023)
by: Ying, Zelin, et al.
Published: (2023)
Translating speech with just images
by: Oneata, Dan, et al.
Published: (2024)
by: Oneata, Dan, et al.
Published: (2024)
A two-stage transliteration approach to improve performance of a multilingual ASR
by: Kumar, Rohit
Published: (2024)
by: Kumar, Rohit
Published: (2024)
The FruitShell French synthesis system at the Blizzard 2023 Challenge
by: Qi, Xin, et al.
Published: (2023)
by: Qi, Xin, et al.
Published: (2023)
Can large audio language models understand child stuttering speech? speech summarization, and source separation
by: Okocha, Chibuzor, et al.
Published: (2025)
by: Okocha, Chibuzor, et al.
Published: (2025)
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
by: Wang, Hsuan-Fu, et al.
Published: (2024)
by: Wang, Hsuan-Fu, et al.
Published: (2024)
Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder
by: Li, Xuyuan, et al.
Published: (2023)
by: Li, Xuyuan, et al.
Published: (2023)
Automated speech audiometry: Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?
by: Araiza-Illan, Gloria, et al.
Published: (2023)
by: Araiza-Illan, Gloria, et al.
Published: (2023)
MiMo-Audio: Audio Language Models are Few-Shot Learners
by: Core Team, et al.
Published: (2025)
by: Core Team, et al.
Published: (2025)
Semantic enrichment towards efficient speech representations
by: Laperrière, Gaëlle, et al.
Published: (2023)
by: Laperrière, Gaëlle, et al.
Published: (2023)
Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR
by: Mei, Yuxiang, et al.
Published: (2026)
by: Mei, Yuxiang, et al.
Published: (2026)
Can Whisper perform speech-based in-context learning?
by: Wang, Siyin, et al.
Published: (2023)
by: Wang, Siyin, et al.
Published: (2023)
Revisiting speech segmentation and lexicon learning with better features
by: Kamper, Herman, et al.
Published: (2024)
by: Kamper, Herman, et al.
Published: (2024)
Self-consistent context aware conformer transducer for speech recognition
by: Kolokolov, Konstantin, et al.
Published: (2024)
by: Kolokolov, Konstantin, et al.
Published: (2024)
LLM-based phoneme-to-grapheme for phoneme-based speech recognition
by: Ma, Te, et al.
Published: (2025)
by: Ma, Te, et al.
Published: (2025)
An efficient text augmentation approach for contextualized Mandarin speech recognition
by: Zheng, Naijun, et al.
Published: (2024)
by: Zheng, Naijun, et al.
Published: (2024)
Direct Punjabi to English speech translation using discrete units
by: Kaur, Prabhjot, et al.
Published: (2024)
by: Kaur, Prabhjot, et al.
Published: (2024)
Transferable speech-to-text large language model alignment module
by: Wu, Boyong, et al.
Published: (2024)
by: Wu, Boyong, et al.
Published: (2024)
Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection
by: Cui, Ziyun, et al.
Published: (2023)
by: Cui, Ziyun, et al.
Published: (2023)
Natural language guidance of high-fidelity text-to-speech with synthetic annotations
by: Lyth, Dan, et al.
Published: (2024)
by: Lyth, Dan, et al.
Published: (2024)
Beyond the binary: Limitations and possibilities of gender-related speech technology research
by: Sanchez, Ariadna, et al.
Published: (2024)
by: Sanchez, Ariadna, et al.
Published: (2024)
An experiment on an automated literature survey of data-driven speech enhancement methods
by: Santos, Arthur dos, et al.
Published: (2023)
by: Santos, Arthur dos, et al.
Published: (2023)
Convoifilter: A case study of doing cocktail party speech recognition
by: Nguyen, Thai-Binh, et al.
Published: (2023)
by: Nguyen, Thai-Binh, et al.
Published: (2023)
Unsupervised lexicon learning from speech is limited by representations rather than clustering
by: Slabbert, Danel, et al.
Published: (2025)
by: Slabbert, Danel, et al.
Published: (2025)
Exploring the limits of decoder-only models trained on public speech recognition corpora
by: Gupta, Ankit, et al.
Published: (2024)
by: Gupta, Ankit, et al.
Published: (2024)
asr_eval: Algorithms and tools for multi-reference and streaming speech recognition evaluation
by: Sedukhin, Oleg, et al.
Published: (2026)
by: Sedukhin, Oleg, et al.
Published: (2026)
Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech
by: Garg, Abhinav, et al.
Published: (2024)
by: Garg, Abhinav, et al.
Published: (2024)
Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice
by: Cheng, Shanbo, et al.
Published: (2025)
by: Cheng, Shanbo, et al.
Published: (2025)
Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet
by: Dhakal, Manish, et al.
Published: (2024)
by: Dhakal, Manish, et al.
Published: (2024)
Korean aegyo speech shows systematic F1 increase to signal childlike qualities
by: Kim, Ji-eun, et al.
Published: (2026)
by: Kim, Ji-eun, et al.
Published: (2026)
AfriHuBERT: A self-supervised speech representation model for African languages
by: Alabi, Jesujoba O., et al.
Published: (2024)
by: Alabi, Jesujoba O., et al.
Published: (2024)
Neighbors and relatives: How do speech embeddings reflect linguistic connections across the world?
by: Törö, Tuukka, et al.
Published: (2025)
by: Törö, Tuukka, et al.
Published: (2025)
Can we reconstruct a dysarthric voice with the large speech model Parler TTS?
by: Sanchez, Ariadna, et al.
Published: (2025)
by: Sanchez, Ariadna, et al.
Published: (2025)
Similar Items
-
The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language
by: Zhu, Jian, et al.
Published: (2023) -
Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition
by: Dong, Lukuang, et al.
Published: (2026) -
ZIPA: A family of efficient models for multilingual phone recognition
by: Zhu, Jian, et al.
Published: (2025) -
A dual task learning approach to fine-tune a multilingual semantic speech encoder for Spoken Language Understanding
by: Laperrière, Gaëlle, et al.
Published: (2024) -
emg2speech: Synthesizing speech from electromyography using self-supervised speech models
by: Gowda, Harshavardhana T., et al.
Published: (2025)