:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yeh, Sung-Lin, Tang, Hao
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Computation and Language
Online Access:	https://arxiv.org/abs/2409.06109
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Learning Speech Representations with Variational Predictive Coding
by: Yeh, Sung-Lin, et al.
Published: (2025)

MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables
by: Yeh, Sung-Lin, et al.
Published: (2026)

Whisper Has an Internal Word Aligner
by: Yeh, Sung-Lin, et al.
Published: (2025)

Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective
by: Liu, Alexander H., et al.
Published: (2024)

Compact Speech Translation Models via Discrete Speech Units Pretraining
by: Lam, Tsz Kin, et al.
Published: (2024)

Crossmodal ASR Error Correction with Discrete Speech Units
by: Li, Yuanchao, et al.
Published: (2024)

DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units
by: Poli, Maxime, et al.
Published: (2026)

DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
by: Shon, Suwon, et al.
Published: (2024)

Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
by: Kim, Minsu, et al.
Published: (2023)

An Empirical Analysis of Discrete Unit Representations in Speech Language Modeling Pre-training
by: Labrak, Yanis, et al.
Published: (2025)

Exploring the Benefits of Tokenization of Discrete Acoustic Units
by: Dekel, Avihu, et al.
Published: (2024)

Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning
by: Shen, Liang-Yeh, et al.
Published: (2025)

UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization
by: Wang, Yuejiao, et al.
Published: (2024)

Effective Context in Neural Speech Models
by: Meng, Yen, et al.
Published: (2025)

Benchmarking Prosody Encoding in Discrete Speech Tokens
by: Onda, Kentaro, et al.
Published: (2025)

Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2025)

Children's Speech Recognition through Discrete Token Enhancement
by: Sukhadia, Vrunda N., et al.
Published: (2024)

Rethinking Discrete Speech Representation Tokens for Accent Generation
by: Zhong, Jinzuomu, et al.
Published: (2026)

Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution
by: Yu, Chin-Yun, et al.
Published: (2022)

Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
by: Duret, Jarod, et al.
Published: (2024)

Transducer Consistency Regularization for Speech to Text Applications
by: Tseng, Cindy, et al.
Published: (2024)

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
by: Cui, Mingyu, et al.
Published: (2024)

Do Discrete Self-Supervised Representations of Speech Capture Tone Distinctions?
by: Osakuade, Opeyemi, et al.
Published: (2024)

VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech
by: Lin, Yi-Cheng, et al.
Published: (2026)

Property Neurons in Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2024)

Linguistic Knowledge Transfer Learning for Speech Enhancement
by: Hung, Kuo-Hsuan, et al.
Published: (2025)

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
by: Dhawan, Kunal, et al.
Published: (2024)

Full-text Error Correction for Chinese Speech Recognition with Large Language Model
by: Tang, Zhiyuan, et al.
Published: (2024)

Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection
by: Lin, Hsi-Che, et al.
Published: (2024)

Data Augmentation for End-to-end Code-switching Speech Recognition
by: Du, Chenpeng, et al.
Published: (2020)

Continual Speech Learning with Fused Speech Features
by: Wang, Guitao, et al.
Published: (2025)

Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation
by: Zhang, Yuhao, et al.
Published: (2025)

SyllableLM: Learning Coarse Semantic Units for Speech Language Models
by: Baade, Alan, et al.
Published: (2024)

A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models
by: Wang, Dingdong, et al.
Published: (2024)

SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition
by: Hsu, Ming-Hao, et al.
Published: (2024)

Chain of Correction for Full-text Speech Recognition with Large Language Models
by: Tang, Zhiyuan, et al.
Published: (2025)

MAD Speech: Measures of Acoustic Diversity of Speech
by: Futeral, Matthieu, et al.
Published: (2024)

Streaming Speech-to-Confusion Network Speech Recognition
by: Filimonov, Denis, et al.
Published: (2023)

Anatomy of the Modality Gap: Dissecting the Internal States of End-to-End Speech LLMs
by: Hsu, Ming-Hao, et al.
Published: (2026)

Textless Speech-to-Speech Translation With Limited Parallel Data
by: Diwan, Anuj, et al.
Published: (2023)