:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Ochieng, Peter
Format:	Preprint
Published:	2023
Subjects:	Sound Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2305.10652
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Speech Synthesis By Unrolling Diffusion Process using Neural Network Layers
by: Ochieng, Peter
Published: (2023)

A Modular-based Strategy for Mitigating Gradient Conflicts in Simultaneous Speech Translation
by: Liu, Xiaoqian, et al.
Published: (2024)

DyPCL: Dynamic Phoneme-level Contrastive Learning for Dysarthric Speech Recognition
by: Lee, Wonjun, et al.
Published: (2025)

A Cross-Corpus Speech Emotion Recognition Method Based on Supervised Contrastive Learning
by: minjie, Xiang
Published: (2024)

Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
by: Kuan, Chun-Yi, et al.
Published: (2024)

Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
by: Chen, Sizhou, et al.
Published: (2023)

A Deep Learning Automatic Speech Recognition Model for Shona Language
by: Sirora, Leslie Wellington, et al.
Published: (2025)

SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation
by: Mu, Zhaoxi, et al.
Published: (2025)

Noise-Aware Speech Separation with Contrastive Learning
by: Zhang, Zizheng, et al.
Published: (2023)

Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models
by: Dowerah, Sandipana, et al.
Published: (2025)

Continual Speech Learning with Fused Speech Features
by: Wang, Guitao, et al.
Published: (2025)

Identifying Primary Stress Across Related Languages and Dialects with Transformer-based Speech Encoder Models
by: Ljubešić, Nikola, et al.
Published: (2025)

Adaptive Inner Speech-Text Alignment for LLM-based Speech Translation
by: Liu, Henglyu, et al.
Published: (2025)

What Do Speech Foundation Models Not Learn About Speech?
by: Waheed, Abdul, et al.
Published: (2024)

Multilingual Dysarthric Speech Assessment Using Universal Phone Recognition and Language-Specific Phonemic Contrast Modeling
by: Yeo, Eunjung, et al.
Published: (2026)

Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
by: Hu, Jiliang, et al.
Published: (2025)

Unimodal Aggregation for CTC-based Speech Recognition
by: Fang, Ying, et al.
Published: (2023)

Convexity-based Pruning of Speech Representation Models
by: Dorszewski, Teresa, et al.
Published: (2024)

Speaker-Aware Simulation Improves Conversational Speech Recognition
by: Gedeon, Máté, et al.
Published: (2026)

Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis
by: Do, Cong-Thanh, et al.
Published: (2024)

SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition
by: Hsu, Ming-Hao, et al.
Published: (2024)

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
by: Jin, Zengrui, et al.
Published: (2024)

STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
by: Jang, Kangwook, et al.
Published: (2023)

TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models
by: Peng, Junyi, et al.
Published: (2025)

SEAL: Speech Embedding Alignment Learning for Speech Large Language Model with Retrieval-Augmented Generation
by: Sun, Chunyu, et al.
Published: (2025)

Causally Disentangled Contrastive Learning for Multilingual Speaker Embeddings
by: Olijslager, Mariëtte, et al.
Published: (2026)

Rethinking Discrete Speech Representation Tokens for Accent Generation
by: Zhong, Jinzuomu, et al.
Published: (2026)

Crossmodal ASR Error Correction with Discrete Speech Units
by: Li, Yuanchao, et al.
Published: (2024)

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
by: Han, HyoJung, et al.
Published: (2024)

SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models
by: Yang, Dongchao, et al.
Published: (2024)

Emotion-Anchored Contrastive Learning Framework for Emotion Recognition in Conversation
by: Yu, Fangxu, et al.
Published: (2024)

Weight Factorization and Centralization for Continual Learning in Speech Recognition
by: Ugan, Enes Yavuz, et al.
Published: (2025)

SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
by: Zhang, Xin, et al.
Published: (2023)

Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
by: Futami, Hayato, et al.
Published: (2025)

Continuous Speech Tokenizer in Text To Speech
by: Li, Yixing, et al.
Published: (2024)

TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems
by: Minixhofer, Christoph, et al.
Published: (2025)

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
by: Cui, Mingyu, et al.
Published: (2024)

Improving Speech-based Emotion Recognition with Contextual Utterance Analysis and LLMs
by: Zhang, Enshi, et al.
Published: (2024)

DYNAC: Dynamic Vocabulary based Non-Autoregressive Contextualization for Speech Recognition
by: Sudo, Yui, et al.
Published: (2025)

SpeechTaxi: On Multilingual Semantic Speech Classification
by: Keller, Lennart, et al.
Published: (2024)