Saved in:
| Main Author: | Ochieng, Peter |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2305.10652 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Speech Synthesis By Unrolling Diffusion Process using Neural Network Layers
by: Ochieng, Peter
Published: (2023)
by: Ochieng, Peter
Published: (2023)
A Modular-based Strategy for Mitigating Gradient Conflicts in Simultaneous Speech Translation
by: Liu, Xiaoqian, et al.
Published: (2024)
by: Liu, Xiaoqian, et al.
Published: (2024)
DyPCL: Dynamic Phoneme-level Contrastive Learning for Dysarthric Speech Recognition
by: Lee, Wonjun, et al.
Published: (2025)
by: Lee, Wonjun, et al.
Published: (2025)
A Cross-Corpus Speech Emotion Recognition Method Based on Supervised Contrastive Learning
by: minjie, Xiang
Published: (2024)
by: minjie, Xiang
Published: (2024)
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
by: Chen, Sizhou, et al.
Published: (2023)
by: Chen, Sizhou, et al.
Published: (2023)
A Deep Learning Automatic Speech Recognition Model for Shona Language
by: Sirora, Leslie Wellington, et al.
Published: (2025)
by: Sirora, Leslie Wellington, et al.
Published: (2025)
SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation
by: Mu, Zhaoxi, et al.
Published: (2025)
by: Mu, Zhaoxi, et al.
Published: (2025)
Noise-Aware Speech Separation with Contrastive Learning
by: Zhang, Zizheng, et al.
Published: (2023)
by: Zhang, Zizheng, et al.
Published: (2023)
Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models
by: Dowerah, Sandipana, et al.
Published: (2025)
by: Dowerah, Sandipana, et al.
Published: (2025)
Continual Speech Learning with Fused Speech Features
by: Wang, Guitao, et al.
Published: (2025)
by: Wang, Guitao, et al.
Published: (2025)
Identifying Primary Stress Across Related Languages and Dialects with Transformer-based Speech Encoder Models
by: Ljubešić, Nikola, et al.
Published: (2025)
by: Ljubešić, Nikola, et al.
Published: (2025)
Adaptive Inner Speech-Text Alignment for LLM-based Speech Translation
by: Liu, Henglyu, et al.
Published: (2025)
by: Liu, Henglyu, et al.
Published: (2025)
What Do Speech Foundation Models Not Learn About Speech?
by: Waheed, Abdul, et al.
Published: (2024)
by: Waheed, Abdul, et al.
Published: (2024)
Multilingual Dysarthric Speech Assessment Using Universal Phone Recognition and Language-Specific Phonemic Contrast Modeling
by: Yeo, Eunjung, et al.
Published: (2026)
by: Yeo, Eunjung, et al.
Published: (2026)
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
by: Hu, Jiliang, et al.
Published: (2025)
by: Hu, Jiliang, et al.
Published: (2025)
Unimodal Aggregation for CTC-based Speech Recognition
by: Fang, Ying, et al.
Published: (2023)
by: Fang, Ying, et al.
Published: (2023)
Convexity-based Pruning of Speech Representation Models
by: Dorszewski, Teresa, et al.
Published: (2024)
by: Dorszewski, Teresa, et al.
Published: (2024)
Speaker-Aware Simulation Improves Conversational Speech Recognition
by: Gedeon, Máté, et al.
Published: (2026)
by: Gedeon, Máté, et al.
Published: (2026)
Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis
by: Do, Cong-Thanh, et al.
Published: (2024)
by: Do, Cong-Thanh, et al.
Published: (2024)
SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition
by: Hsu, Ming-Hao, et al.
Published: (2024)
by: Hsu, Ming-Hao, et al.
Published: (2024)
LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
by: Jin, Zengrui, et al.
Published: (2024)
by: Jin, Zengrui, et al.
Published: (2024)
STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
by: Jang, Kangwook, et al.
Published: (2023)
by: Jang, Kangwook, et al.
Published: (2023)
TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models
by: Peng, Junyi, et al.
Published: (2025)
by: Peng, Junyi, et al.
Published: (2025)
SEAL: Speech Embedding Alignment Learning for Speech Large Language Model with Retrieval-Augmented Generation
by: Sun, Chunyu, et al.
Published: (2025)
by: Sun, Chunyu, et al.
Published: (2025)
Causally Disentangled Contrastive Learning for Multilingual Speaker Embeddings
by: Olijslager, Mariëtte, et al.
Published: (2026)
by: Olijslager, Mariëtte, et al.
Published: (2026)
Rethinking Discrete Speech Representation Tokens for Accent Generation
by: Zhong, Jinzuomu, et al.
Published: (2026)
by: Zhong, Jinzuomu, et al.
Published: (2026)
Crossmodal ASR Error Correction with Discrete Speech Units
by: Li, Yuanchao, et al.
Published: (2024)
by: Li, Yuanchao, et al.
Published: (2024)
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
by: Han, HyoJung, et al.
Published: (2024)
by: Han, HyoJung, et al.
Published: (2024)
SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models
by: Yang, Dongchao, et al.
Published: (2024)
by: Yang, Dongchao, et al.
Published: (2024)
Emotion-Anchored Contrastive Learning Framework for Emotion Recognition in Conversation
by: Yu, Fangxu, et al.
Published: (2024)
by: Yu, Fangxu, et al.
Published: (2024)
Weight Factorization and Centralization for Continual Learning in Speech Recognition
by: Ugan, Enes Yavuz, et al.
Published: (2025)
by: Ugan, Enes Yavuz, et al.
Published: (2025)
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
by: Zhang, Xin, et al.
Published: (2023)
by: Zhang, Xin, et al.
Published: (2023)
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
by: Futami, Hayato, et al.
Published: (2025)
by: Futami, Hayato, et al.
Published: (2025)
Continuous Speech Tokenizer in Text To Speech
by: Li, Yixing, et al.
Published: (2024)
by: Li, Yixing, et al.
Published: (2024)
TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems
by: Minixhofer, Christoph, et al.
Published: (2025)
by: Minixhofer, Christoph, et al.
Published: (2025)
Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
by: Cui, Mingyu, et al.
Published: (2024)
by: Cui, Mingyu, et al.
Published: (2024)
Improving Speech-based Emotion Recognition with Contextual Utterance Analysis and LLMs
by: Zhang, Enshi, et al.
Published: (2024)
by: Zhang, Enshi, et al.
Published: (2024)
DYNAC: Dynamic Vocabulary based Non-Autoregressive Contextualization for Speech Recognition
by: Sudo, Yui, et al.
Published: (2025)
by: Sudo, Yui, et al.
Published: (2025)
SpeechTaxi: On Multilingual Semantic Speech Classification
by: Keller, Lennart, et al.
Published: (2024)
by: Keller, Lennart, et al.
Published: (2024)
Similar Items
-
Speech Synthesis By Unrolling Diffusion Process using Neural Network Layers
by: Ochieng, Peter
Published: (2023) -
A Modular-based Strategy for Mitigating Gradient Conflicts in Simultaneous Speech Translation
by: Liu, Xiaoqian, et al.
Published: (2024) -
DyPCL: Dynamic Phoneme-level Contrastive Learning for Dysarthric Speech Recognition
by: Lee, Wonjun, et al.
Published: (2025) -
A Cross-Corpus Speech Emotion Recognition Method Based on Supervised Contrastive Learning
by: minjie, Xiang
Published: (2024) -
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
by: Kuan, Chun-Yi, et al.
Published: (2024)