Saved in:
| Main Authors: | Yeh, Sung-Lin, Tang, Hao |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.06109 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Learning Speech Representations with Variational Predictive Coding
by: Yeh, Sung-Lin, et al.
Published: (2025)
by: Yeh, Sung-Lin, et al.
Published: (2025)
MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables
by: Yeh, Sung-Lin, et al.
Published: (2026)
by: Yeh, Sung-Lin, et al.
Published: (2026)
Whisper Has an Internal Word Aligner
by: Yeh, Sung-Lin, et al.
Published: (2025)
by: Yeh, Sung-Lin, et al.
Published: (2025)
Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective
by: Liu, Alexander H., et al.
Published: (2024)
by: Liu, Alexander H., et al.
Published: (2024)
Compact Speech Translation Models via Discrete Speech Units Pretraining
by: Lam, Tsz Kin, et al.
Published: (2024)
by: Lam, Tsz Kin, et al.
Published: (2024)
Crossmodal ASR Error Correction with Discrete Speech Units
by: Li, Yuanchao, et al.
Published: (2024)
by: Li, Yuanchao, et al.
Published: (2024)
DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units
by: Poli, Maxime, et al.
Published: (2026)
by: Poli, Maxime, et al.
Published: (2026)
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
by: Shon, Suwon, et al.
Published: (2024)
by: Shon, Suwon, et al.
Published: (2024)
Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
by: Kim, Minsu, et al.
Published: (2023)
by: Kim, Minsu, et al.
Published: (2023)
An Empirical Analysis of Discrete Unit Representations in Speech Language Modeling Pre-training
by: Labrak, Yanis, et al.
Published: (2025)
by: Labrak, Yanis, et al.
Published: (2025)
Exploring the Benefits of Tokenization of Discrete Acoustic Units
by: Dekel, Avihu, et al.
Published: (2024)
by: Dekel, Avihu, et al.
Published: (2024)
Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning
by: Shen, Liang-Yeh, et al.
Published: (2025)
by: Shen, Liang-Yeh, et al.
Published: (2025)
UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization
by: Wang, Yuejiao, et al.
Published: (2024)
by: Wang, Yuejiao, et al.
Published: (2024)
Effective Context in Neural Speech Models
by: Meng, Yen, et al.
Published: (2025)
by: Meng, Yen, et al.
Published: (2025)
Benchmarking Prosody Encoding in Discrete Speech Tokens
by: Onda, Kentaro, et al.
Published: (2025)
by: Onda, Kentaro, et al.
Published: (2025)
Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2025)
by: Lin, Tzu-Quan, et al.
Published: (2025)
Children's Speech Recognition through Discrete Token Enhancement
by: Sukhadia, Vrunda N., et al.
Published: (2024)
by: Sukhadia, Vrunda N., et al.
Published: (2024)
Rethinking Discrete Speech Representation Tokens for Accent Generation
by: Zhong, Jinzuomu, et al.
Published: (2026)
by: Zhong, Jinzuomu, et al.
Published: (2026)
Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution
by: Yu, Chin-Yun, et al.
Published: (2022)
by: Yu, Chin-Yun, et al.
Published: (2022)
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
by: Duret, Jarod, et al.
Published: (2024)
by: Duret, Jarod, et al.
Published: (2024)
Transducer Consistency Regularization for Speech to Text Applications
by: Tseng, Cindy, et al.
Published: (2024)
by: Tseng, Cindy, et al.
Published: (2024)
Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
by: Cui, Mingyu, et al.
Published: (2024)
by: Cui, Mingyu, et al.
Published: (2024)
Do Discrete Self-Supervised Representations of Speech Capture Tone Distinctions?
by: Osakuade, Opeyemi, et al.
Published: (2024)
by: Osakuade, Opeyemi, et al.
Published: (2024)
VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech
by: Lin, Yi-Cheng, et al.
Published: (2026)
by: Lin, Yi-Cheng, et al.
Published: (2026)
Property Neurons in Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2024)
by: Lin, Tzu-Quan, et al.
Published: (2024)
Linguistic Knowledge Transfer Learning for Speech Enhancement
by: Hung, Kuo-Hsuan, et al.
Published: (2025)
by: Hung, Kuo-Hsuan, et al.
Published: (2025)
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
by: Dhawan, Kunal, et al.
Published: (2024)
by: Dhawan, Kunal, et al.
Published: (2024)
Full-text Error Correction for Chinese Speech Recognition with Large Language Model
by: Tang, Zhiyuan, et al.
Published: (2024)
by: Tang, Zhiyuan, et al.
Published: (2024)
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection
by: Lin, Hsi-Che, et al.
Published: (2024)
by: Lin, Hsi-Che, et al.
Published: (2024)
Data Augmentation for End-to-end Code-switching Speech Recognition
by: Du, Chenpeng, et al.
Published: (2020)
by: Du, Chenpeng, et al.
Published: (2020)
Continual Speech Learning with Fused Speech Features
by: Wang, Guitao, et al.
Published: (2025)
by: Wang, Guitao, et al.
Published: (2025)
Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation
by: Zhang, Yuhao, et al.
Published: (2025)
by: Zhang, Yuhao, et al.
Published: (2025)
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
by: Baade, Alan, et al.
Published: (2024)
by: Baade, Alan, et al.
Published: (2024)
A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models
by: Wang, Dingdong, et al.
Published: (2024)
by: Wang, Dingdong, et al.
Published: (2024)
SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition
by: Hsu, Ming-Hao, et al.
Published: (2024)
by: Hsu, Ming-Hao, et al.
Published: (2024)
Chain of Correction for Full-text Speech Recognition with Large Language Models
by: Tang, Zhiyuan, et al.
Published: (2025)
by: Tang, Zhiyuan, et al.
Published: (2025)
MAD Speech: Measures of Acoustic Diversity of Speech
by: Futeral, Matthieu, et al.
Published: (2024)
by: Futeral, Matthieu, et al.
Published: (2024)
Streaming Speech-to-Confusion Network Speech Recognition
by: Filimonov, Denis, et al.
Published: (2023)
by: Filimonov, Denis, et al.
Published: (2023)
Anatomy of the Modality Gap: Dissecting the Internal States of End-to-End Speech LLMs
by: Hsu, Ming-Hao, et al.
Published: (2026)
by: Hsu, Ming-Hao, et al.
Published: (2026)
Textless Speech-to-Speech Translation With Limited Parallel Data
by: Diwan, Anuj, et al.
Published: (2023)
by: Diwan, Anuj, et al.
Published: (2023)
Similar Items
-
Learning Speech Representations with Variational Predictive Coding
by: Yeh, Sung-Lin, et al.
Published: (2025) -
MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables
by: Yeh, Sung-Lin, et al.
Published: (2026) -
Whisper Has an Internal Word Aligner
by: Yeh, Sung-Lin, et al.
Published: (2025) -
Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective
by: Liu, Alexander H., et al.
Published: (2024) -
Compact Speech Translation Models via Discrete Speech Units Pretraining
by: Lam, Tsz Kin, et al.
Published: (2024)