:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zampierin, Luca, Hacene, Ghouthi Boukli, Nguyen, Bac, Ravanelli, Mirco
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Computation and Language Machine Learning Sound
Online Access:	https://arxiv.org/abs/2402.16830
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

What Are They Doing? Joint Audio-Speech Co-Reasoning
by: Wang, Yingzhi, et al.
Published: (2024)

ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs
by: Mousavi, Pooneh, et al.
Published: (2025)

LL-SDR: Low-Latency Speech enhancement through Discrete Representations
by: Li, Jingyi, et al.
Published: (2026)

Autoregressive Speech Enhancement via Acoustic Tokens
by: Della Libera, Luca, et al.
Published: (2025)

ProGRes: Prompted Generative Rescoring on ASR n-Best
by: Tur, Ada Defne, et al.
Published: (2024)

How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
by: Mousavi, Pooneh, et al.
Published: (2024)

Listen First, Then Answer: Timestamp-Grounded Speech Reasoning
by: Jeong, Jihoon, et al.
Published: (2026)

Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
by: Zaiem, Salah, et al.
Published: (2023)

FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation
by: Della Libera, Luca, et al.
Published: (2025)

Towards Robust FastSpeech 2 by Modelling Residual Multimodality
by: Kögel, Fabian, et al.
Published: (2023)

Focal Modulation Networks for Interpretable Sound Classification
by: Della Libera, Luca, et al.
Published: (2024)

SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
by: Ashihara, Takanori, et al.
Published: (2023)

FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks
by: Della Libera, Luca, et al.
Published: (2025)

TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models
by: Peng, Junyi, et al.
Published: (2025)

STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
by: Jang, Kangwook, et al.
Published: (2023)

Interface Design for Self-Supervised Speech Models
by: Shih, Yi-Jen, et al.
Published: (2024)

Is Self-Supervised Learning Enough to Fill in the Gap? A Study on Speech Inpainting
by: Asaad, Ihab, et al.
Published: (2024)

Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice
by: Gupta, Shubham, et al.
Published: (2024)

Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
by: Wang, Yujin, et al.
Published: (2022)

DASB - Discrete Audio and Speech Benchmark
by: Mousavi, Pooneh, et al.
Published: (2024)

Listenable Maps for Zero-Shot Audio Classifiers
by: Paissan, Francesco, et al.
Published: (2024)

Emotion-Aware Speech Self-Supervised Representation Learning with Intensity Knowledge
by: Liu, Rui, et al.
Published: (2024)

Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models
by: Phuong, Tuan Dat, et al.
Published: (2025)

Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation
by: Hwang, Min-Jae, et al.
Published: (2024)

Listenable Maps for Audio Classifiers
by: Paissan, Francesco, et al.
Published: (2024)

Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models
by: Fan, Ruchao, et al.
Published: (2024)

Do Discrete Self-Supervised Representations of Speech Capture Tone Distinctions?
by: Osakuade, Opeyemi, et al.
Published: (2024)

BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition
by: Jiang, Liuyuan, et al.
Published: (2025)

Pairwise Evaluation of Accent Similarity in Speech Synthesis
by: Zhong, Jinzuomu, et al.
Published: (2025)

Exploring Acoustic Similarity in Emotional Speech and Music via Self-Supervised Representations
by: Sun, Yujia, et al.
Published: (2024)

Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2025)

Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model
by: Park, Joonyong, et al.
Published: (2024)

Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
by: Park, Chanho, et al.
Published: (2023)

Probing for Phonology in Self-Supervised Speech Representations: A Case Study on Accent Perception
by: Venkateswaran, Nitin, et al.
Published: (2025)

Adapting Self-Supervised Speech Representations for Cross-lingual Dysarthria Detection in Parkinson's Disease
by: Hernandez, Abner, et al.
Published: (2026)

Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech
by: Mancini, Eleonora, et al.
Published: (2024)

Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models
by: Wang, Haoyu, et al.
Published: (2022)

What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis
by: Ashihara, Takanori, et al.
Published: (2024)

Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis
by: Xu, Tianyi, et al.
Published: (2025)

Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition
by: Saliba, Alexandra, et al.
Published: (2024)