Saved in:
| Main Authors: | Zampierin, Luca, Hacene, Ghouthi Boukli, Nguyen, Bac, Ravanelli, Mirco |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.16830 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
What Are They Doing? Joint Audio-Speech Co-Reasoning
by: Wang, Yingzhi, et al.
Published: (2024)
by: Wang, Yingzhi, et al.
Published: (2024)
ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs
by: Mousavi, Pooneh, et al.
Published: (2025)
by: Mousavi, Pooneh, et al.
Published: (2025)
LL-SDR: Low-Latency Speech enhancement through Discrete Representations
by: Li, Jingyi, et al.
Published: (2026)
by: Li, Jingyi, et al.
Published: (2026)
Autoregressive Speech Enhancement via Acoustic Tokens
by: Della Libera, Luca, et al.
Published: (2025)
by: Della Libera, Luca, et al.
Published: (2025)
ProGRes: Prompted Generative Rescoring on ASR n-Best
by: Tur, Ada Defne, et al.
Published: (2024)
by: Tur, Ada Defne, et al.
Published: (2024)
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
by: Mousavi, Pooneh, et al.
Published: (2024)
by: Mousavi, Pooneh, et al.
Published: (2024)
Listen First, Then Answer: Timestamp-Grounded Speech Reasoning
by: Jeong, Jihoon, et al.
Published: (2026)
by: Jeong, Jihoon, et al.
Published: (2026)
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
by: Zaiem, Salah, et al.
Published: (2023)
by: Zaiem, Salah, et al.
Published: (2023)
FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation
by: Della Libera, Luca, et al.
Published: (2025)
by: Della Libera, Luca, et al.
Published: (2025)
Towards Robust FastSpeech 2 by Modelling Residual Multimodality
by: Kögel, Fabian, et al.
Published: (2023)
by: Kögel, Fabian, et al.
Published: (2023)
Focal Modulation Networks for Interpretable Sound Classification
by: Della Libera, Luca, et al.
Published: (2024)
by: Della Libera, Luca, et al.
Published: (2024)
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
by: Ashihara, Takanori, et al.
Published: (2023)
by: Ashihara, Takanori, et al.
Published: (2023)
FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks
by: Della Libera, Luca, et al.
Published: (2025)
by: Della Libera, Luca, et al.
Published: (2025)
TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models
by: Peng, Junyi, et al.
Published: (2025)
by: Peng, Junyi, et al.
Published: (2025)
STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models
by: Jang, Kangwook, et al.
Published: (2023)
by: Jang, Kangwook, et al.
Published: (2023)
Interface Design for Self-Supervised Speech Models
by: Shih, Yi-Jen, et al.
Published: (2024)
by: Shih, Yi-Jen, et al.
Published: (2024)
Is Self-Supervised Learning Enough to Fill in the Gap? A Study on Speech Inpainting
by: Asaad, Ihab, et al.
Published: (2024)
by: Asaad, Ihab, et al.
Published: (2024)
Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice
by: Gupta, Shubham, et al.
Published: (2024)
by: Gupta, Shubham, et al.
Published: (2024)
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
by: Wang, Yujin, et al.
Published: (2022)
by: Wang, Yujin, et al.
Published: (2022)
DASB - Discrete Audio and Speech Benchmark
by: Mousavi, Pooneh, et al.
Published: (2024)
by: Mousavi, Pooneh, et al.
Published: (2024)
Listenable Maps for Zero-Shot Audio Classifiers
by: Paissan, Francesco, et al.
Published: (2024)
by: Paissan, Francesco, et al.
Published: (2024)
Emotion-Aware Speech Self-Supervised Representation Learning with Intensity Knowledge
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models
by: Phuong, Tuan Dat, et al.
Published: (2025)
by: Phuong, Tuan Dat, et al.
Published: (2025)
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation
by: Hwang, Min-Jae, et al.
Published: (2024)
by: Hwang, Min-Jae, et al.
Published: (2024)
Listenable Maps for Audio Classifiers
by: Paissan, Francesco, et al.
Published: (2024)
by: Paissan, Francesco, et al.
Published: (2024)
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models
by: Fan, Ruchao, et al.
Published: (2024)
by: Fan, Ruchao, et al.
Published: (2024)
Do Discrete Self-Supervised Representations of Speech Capture Tone Distinctions?
by: Osakuade, Opeyemi, et al.
Published: (2024)
by: Osakuade, Opeyemi, et al.
Published: (2024)
BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition
by: Jiang, Liuyuan, et al.
Published: (2025)
by: Jiang, Liuyuan, et al.
Published: (2025)
Pairwise Evaluation of Accent Similarity in Speech Synthesis
by: Zhong, Jinzuomu, et al.
Published: (2025)
by: Zhong, Jinzuomu, et al.
Published: (2025)
Exploring Acoustic Similarity in Emotional Speech and Music via Self-Supervised Representations
by: Sun, Yujia, et al.
Published: (2024)
by: Sun, Yujia, et al.
Published: (2024)
Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers
by: Lin, Tzu-Quan, et al.
Published: (2025)
by: Lin, Tzu-Quan, et al.
Published: (2025)
Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model
by: Park, Joonyong, et al.
Published: (2024)
by: Park, Joonyong, et al.
Published: (2024)
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
by: Park, Chanho, et al.
Published: (2023)
by: Park, Chanho, et al.
Published: (2023)
Probing for Phonology in Self-Supervised Speech Representations: A Case Study on Accent Perception
by: Venkateswaran, Nitin, et al.
Published: (2025)
by: Venkateswaran, Nitin, et al.
Published: (2025)
Adapting Self-Supervised Speech Representations for Cross-lingual Dysarthria Detection in Parkinson's Disease
by: Hernandez, Abner, et al.
Published: (2026)
by: Hernandez, Abner, et al.
Published: (2026)
Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech
by: Mancini, Eleonora, et al.
Published: (2024)
by: Mancini, Eleonora, et al.
Published: (2024)
Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models
by: Wang, Haoyu, et al.
Published: (2022)
by: Wang, Haoyu, et al.
Published: (2022)
What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis
by: Ashihara, Takanori, et al.
Published: (2024)
by: Ashihara, Takanori, et al.
Published: (2024)
Leveraging LLM and Self-Supervised Training Models for Speech Recognition in Chinese Dialects: A Comparative Analysis
by: Xu, Tianyi, et al.
Published: (2025)
by: Xu, Tianyi, et al.
Published: (2025)
Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition
by: Saliba, Alexandra, et al.
Published: (2024)
by: Saliba, Alexandra, et al.
Published: (2024)
Similar Items
-
What Are They Doing? Joint Audio-Speech Co-Reasoning
by: Wang, Yingzhi, et al.
Published: (2024) -
ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs
by: Mousavi, Pooneh, et al.
Published: (2025) -
LL-SDR: Low-Latency Speech enhancement through Discrete Representations
by: Li, Jingyi, et al.
Published: (2026) -
Autoregressive Speech Enhancement via Acoustic Tokens
by: Della Libera, Luca, et al.
Published: (2025) -
ProGRes: Prompted Generative Rescoring on ASR n-Best
by: Tur, Ada Defne, et al.
Published: (2024)