Saved in:
| Main Authors: | Yeh, Sung-Lin, Bell, Peter, Tang, Hao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.00100 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Estimating the Completeness of Discrete Speech Units
by: Yeh, Sung-Lin, et al.
Published: (2024)
by: Yeh, Sung-Lin, et al.
Published: (2024)
Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective
by: Liu, Alexander H., et al.
Published: (2024)
by: Liu, Alexander H., et al.
Published: (2024)
Whisper Has an Internal Word Aligner
by: Yeh, Sung-Lin, et al.
Published: (2025)
by: Yeh, Sung-Lin, et al.
Published: (2025)
MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables
by: Yeh, Sung-Lin, et al.
Published: (2026)
by: Yeh, Sung-Lin, et al.
Published: (2026)
Rethinking Discrete Speech Representation Tokens for Accent Generation
by: Zhong, Jinzuomu, et al.
Published: (2026)
by: Zhong, Jinzuomu, et al.
Published: (2026)
Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution
by: Yu, Chin-Yun, et al.
Published: (2022)
by: Yu, Chin-Yun, et al.
Published: (2022)
TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text to Speech Systems
by: Minixhofer, Christoph, et al.
Published: (2025)
by: Minixhofer, Christoph, et al.
Published: (2025)
Crossmodal ASR Error Correction with Discrete Speech Units
by: Li, Yuanchao, et al.
Published: (2024)
by: Li, Yuanchao, et al.
Published: (2024)
Data Augmentation for End-to-end Code-switching Speech Recognition
by: Du, Chenpeng, et al.
Published: (2020)
by: Du, Chenpeng, et al.
Published: (2020)
Codec2Vec: Self-Supervised Speech Representation Learning Using Neural Speech Codecs
by: Tseng, Wei-Cheng, et al.
Published: (2025)
by: Tseng, Wei-Cheng, et al.
Published: (2025)
Entropy-based Coarse and Compressed Semantic Speech Representation Learning
by: Zuo, Jialong, et al.
Published: (2025)
by: Zuo, Jialong, et al.
Published: (2025)
Emphasis Sensitivity in Speech Representations
by: Cassini, Shaun, et al.
Published: (2025)
by: Cassini, Shaun, et al.
Published: (2025)
Beyond Oversmoothing: Evaluating DDPM and MSE for Scalable Speech Synthesis in ASR
by: Minixhofer, Christoph, et al.
Published: (2024)
by: Minixhofer, Christoph, et al.
Published: (2024)
Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning
by: Shen, Liang-Yeh, et al.
Published: (2025)
by: Shen, Liang-Yeh, et al.
Published: (2025)
Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques
by: Li, Yuanchao, et al.
Published: (2024)
by: Li, Yuanchao, et al.
Published: (2024)
TTSDS -- Text-to-Speech Distribution Score
by: Minixhofer, Christoph, et al.
Published: (2024)
by: Minixhofer, Christoph, et al.
Published: (2024)
Effective Context in Neural Speech Models
by: Meng, Yen, et al.
Published: (2025)
by: Meng, Yen, et al.
Published: (2025)
Linguistic Knowledge Transfer Learning for Speech Enhancement
by: Hung, Kuo-Hsuan, et al.
Published: (2025)
by: Hung, Kuo-Hsuan, et al.
Published: (2025)
Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2025)
by: Lin, Tzu-Quan, et al.
Published: (2025)
A Practitioner's Guide to Building ASR Models for Low-Resource Languages: A Case Study on Scottish Gaelic
by: Klejch, Ondřej, et al.
Published: (2025)
by: Klejch, Ondřej, et al.
Published: (2025)
Continual Speech Learning with Fused Speech Features
by: Wang, Guitao, et al.
Published: (2025)
by: Wang, Guitao, et al.
Published: (2025)
Probing for Phonology in Self-Supervised Speech Representations: A Case Study on Accent Perception
by: Venkateswaran, Nitin, et al.
Published: (2025)
by: Venkateswaran, Nitin, et al.
Published: (2025)
Language Bias in Self-Supervised Learning For Automatic Speech Recognition
by: Storey, Edward, et al.
Published: (2025)
by: Storey, Edward, et al.
Published: (2025)
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
by: Han, HyoJung, et al.
Published: (2024)
by: Han, HyoJung, et al.
Published: (2024)
SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition
by: Hsu, Ming-Hao, et al.
Published: (2024)
by: Hsu, Ming-Hao, et al.
Published: (2024)
Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model
by: Huang, Hukai, et al.
Published: (2024)
by: Huang, Hukai, et al.
Published: (2024)
The Voice Behind the Words: Quantifying Intersectional Bias in SpeechLLMs
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2026)
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2026)
Speech Separation based on Contrastive Learning and Deep Modularization
by: Ochieng, Peter
Published: (2023)
by: Ochieng, Peter
Published: (2023)
Transducer Consistency Regularization for Speech to Text Applications
by: Tseng, Cindy, et al.
Published: (2024)
by: Tseng, Cindy, et al.
Published: (2024)
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
by: Liu, Alexander H., et al.
Published: (2025)
by: Liu, Alexander H., et al.
Published: (2025)
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
by: Wei, Kun, et al.
Published: (2023)
by: Wei, Kun, et al.
Published: (2023)
Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation
by: Lin, Zhennan, et al.
Published: (2025)
by: Lin, Zhennan, et al.
Published: (2025)
Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus
by: Harati, Amir, et al.
Published: (2024)
by: Harati, Amir, et al.
Published: (2024)
VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech
by: Lin, Yi-Cheng, et al.
Published: (2026)
by: Lin, Yi-Cheng, et al.
Published: (2026)
Measuring Entrainment in Spontaneous Code-switched Speech
by: Bhattacharya, Debasmita, et al.
Published: (2023)
by: Bhattacharya, Debasmita, et al.
Published: (2023)
Convolutional Variational Autoencoders for Spectrogram Compression in Automatic Speech Recognition
by: Iakovenko, Olga, et al.
Published: (2024)
by: Iakovenko, Olga, et al.
Published: (2024)
Convexity-based Pruning of Speech Representation Models
by: Dorszewski, Teresa, et al.
Published: (2024)
by: Dorszewski, Teresa, et al.
Published: (2024)
Configurable Multilingual ASR with Speech Summary Representations
by: Zhu, Harrison, et al.
Published: (2024)
by: Zhu, Harrison, et al.
Published: (2024)
Representation Purification for End-to-End Speech Translation
by: Zhang, Chengwei, et al.
Published: (2024)
by: Zhang, Chengwei, et al.
Published: (2024)
Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders
by: Shi, Hao, et al.
Published: (2023)
by: Shi, Hao, et al.
Published: (2023)
Similar Items
-
Estimating the Completeness of Discrete Speech Units
by: Yeh, Sung-Lin, et al.
Published: (2024) -
Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective
by: Liu, Alexander H., et al.
Published: (2024) -
Whisper Has an Internal Word Aligner
by: Yeh, Sung-Lin, et al.
Published: (2025) -
MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables
by: Yeh, Sung-Lin, et al.
Published: (2026) -
Rethinking Discrete Speech Representation Tokens for Accent Generation
by: Zhong, Jinzuomu, et al.
Published: (2026)