:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yeh, Sung-Lin, Bell, Peter, Tang, Hao
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Computation and Language
Online Access:	https://arxiv.org/abs/2601.00100
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Estimating the Completeness of Discrete Speech Units
by: Yeh, Sung-Lin, et al.
Published: (2024)

Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective
by: Liu, Alexander H., et al.
Published: (2024)

Whisper Has an Internal Word Aligner
by: Yeh, Sung-Lin, et al.
Published: (2025)

MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables
by: Yeh, Sung-Lin, et al.
Published: (2026)

Rethinking Discrete Speech Representation Tokens for Accent Generation
by: Zhong, Jinzuomu, et al.
Published: (2026)

Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution
by: Yu, Chin-Yun, et al.
Published: (2022)

TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems
by: Minixhofer, Christoph, et al.
Published: (2025)

Crossmodal ASR Error Correction with Discrete Speech Units
by: Li, Yuanchao, et al.
Published: (2024)

Data Augmentation for End-to-end Code-switching Speech Recognition
by: Du, Chenpeng, et al.
Published: (2020)

Codec2Vec: Self-Supervised Speech Representation Learning Using Neural Speech Codecs
by: Tseng, Wei-Cheng, et al.
Published: (2025)

Entropy-based Coarse and Compressed Semantic Speech Representation Learning
by: Zuo, Jialong, et al.
Published: (2025)

Emphasis Sensitivity in Speech Representations
by: Cassini, Shaun, et al.
Published: (2025)

Beyond Oversmoothing: Evaluating DDPM and MSE for Scalable Speech Synthesis in ASR
by: Minixhofer, Christoph, et al.
Published: (2024)

Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning
by: Shen, Liang-Yeh, et al.
Published: (2025)

Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques
by: Li, Yuanchao, et al.
Published: (2024)

TTSDS -- Text-to-Speech Distribution Score
by: Minixhofer, Christoph, et al.
Published: (2024)

Effective Context in Neural Speech Models
by: Meng, Yen, et al.
Published: (2025)

Linguistic Knowledge Transfer Learning for Speech Enhancement
by: Hung, Kuo-Hsuan, et al.
Published: (2025)

Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2025)

A Practitioner's Guide to Building ASR Models for Low-Resource Languages: A Case Study on Scottish Gaelic
by: Klejch, Ondřej, et al.
Published: (2025)

Continual Speech Learning with Fused Speech Features
by: Wang, Guitao, et al.
Published: (2025)

Probing for Phonology in Self-Supervised Speech Representations: A Case Study on Accent Perception
by: Venkateswaran, Nitin, et al.
Published: (2025)

Language Bias in Self-Supervised Learning For Automatic Speech Recognition
by: Storey, Edward, et al.
Published: (2025)

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
by: Han, HyoJung, et al.
Published: (2024)

SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition
by: Hsu, Ming-Hao, et al.
Published: (2024)

Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model
by: Huang, Hukai, et al.
Published: (2024)

The Voice Behind the Words: Quantifying Intersectional Bias in SpeechLLMs
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2026)

Speech Separation based on Contrastive Learning and Deep Modularization
by: Ochieng, Peter
Published: (2023)

Transducer Consistency Regularization for Speech to Text Applications
by: Tseng, Cindy, et al.
Published: (2024)

UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
by: Liu, Alexander H., et al.
Published: (2025)

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
by: Wei, Kun, et al.
Published: (2023)

Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation
by: Lin, Zhennan, et al.
Published: (2025)

Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus
by: Harati, Amir, et al.
Published: (2024)

VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech
by: Lin, Yi-Cheng, et al.
Published: (2026)

Measuring Entrainment in Spontaneous Code-switched Speech
by: Bhattacharya, Debasmita, et al.
Published: (2023)

Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition
by: Iakovenko, Olga, et al.
Published: (2024)

Convexity-based Pruning of Speech Representation Models
by: Dorszewski, Teresa, et al.
Published: (2024)

Configurable Multilingual ASR with Speech Summary Representations
by: Zhu, Harrison, et al.
Published: (2024)

Representation Purification for End-to-End Speech Translation
by: Zhang, Chengwei, et al.
Published: (2024)

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders
by: Shi, Hao, et al.
Published: (2023)