:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wright, George August, Cappellazzo, Umberto, Zaiem, Salah, Raj, Desh, Yang, Lucas Ondel, Falavigna, Daniele, Ali, Mohamed Nabih, Brutti, Alessio
Format:	Preprint
Published:	2023
Subjects:	Audio and Speech Processing Computation and Language Sound
Online Access:	https://arxiv.org/abs/2309.09546
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices
by: Lasbordes, Maxence, et al.
Published: (2025)

Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters
by: Cappellazzo, Umberto, et al.
Published: (2024)

Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers
by: Cappellazzo, Umberto, et al.
Published: (2023)

Continual Contrastive Spoken Language Understanding
by: Cappellazzo, Umberto, et al.
Published: (2023)

Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach
by: Cappellazzo, Umberto, et al.
Published: (2025)

Large Language Models are Strong Audio-Visual Speech Recognition Learners
by: Cappellazzo, Umberto, et al.
Published: (2024)

Input Conditioned Layer Dropping in Speech Foundation Models
by: Hannan, Abdul, et al.
Published: (2025)

MLMA: Towards Multilingual ASR With Mamba-based Architectures
by: Ali, Mohamed Nabih, et al.
Published: (2025)

Federating Dynamic Models using Early-Exit Architectures for Automatic Speech Recognition on Heterogeneous Clients
by: Ali, Mohamed Nabih, et al.
Published: (2024)

Listening to Multi-talker Conversations: Modular and End-to-end Perspectives
by: Raj, Desh
Published: (2024)

Prominence-aware automatic speech recognition for conversational speech
by: Linke, Julian, et al.
Published: (2025)

Less Forgetting for Better Generalization: Exploring Continual-learning Fine-tuning Methods for Speech Self-supervised Representations
by: Zaiem, Salah, et al.
Published: (2024)

Evaluating and Improving Continual Learning in Spoken Language Understanding
by: Yang, Muqiao, et al.
Published: (2024)

Zipformer: A faster and better encoder for automatic speech recognition
by: Yao, Zengwei, et al.
Published: (2023)

Robustifying automatic speech recognition by extracting slowly varying features
by: Pizarro, Matías, et al.
Published: (2021)

Dr. SHAP-AV: Decoding Relative Modality Contributions via Shapley Attribution in Audio-Visual Speech Recognition
by: Cappellazzo, Umberto, et al.
Published: (2026)

Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs
by: Cappellazzo, Umberto, et al.
Published: (2025)

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
by: Gong, Rong, et al.
Published: (2024)

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
by: Gaido, Marco, et al.
Published: (2024)

Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMs
by: Anand, et al.
Published: (2025)

Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition
by: Zhang, Yiru, et al.
Published: (2025)

The evaluation of a code-switched Sepedi-English automatic speech recognition system
by: Phaladi, Amanda, et al.
Published: (2024)

SeMaScore : a new evaluation metric for automatic speech recognition tasks
by: Sasindran, Zitha, et al.
Published: (2024)

Automated speech audiometry: Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?
by: Araiza-Illan, Gloria, et al.
Published: (2023)

Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
by: Zaiem, Salah, et al.
Published: (2023)

Faster Speech-LLaMA Inference with Multi-token Prediction
by: Raj, Desh, et al.
Published: (2024)

Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models
by: Cappellazzo, Umberto, et al.
Published: (2025)

Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens
by: San, Nay, et al.
Published: (2024)

Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages
by: Fong, Seraphina, et al.
Published: (2025)

Introduction to speech recognition
by: Dauphin, Gabriel
Published: (2024)

Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
by: Vecino, Biel Tura, et al.
Published: (2025)

Improving child speech recognition with augmented child-like speech
by: Zhang, Yuanyuan, et al.
Published: (2024)

CognoSpeak: an automatic, remote assessment of early cognitive decline in real-world conversational speech
by: Pahar, Madhurananda, et al.
Published: (2025)

Low-resource speech recognition and dialect identification of Irish in a multi-task framework
by: Lonergan, Liam, et al.
Published: (2024)

Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus
by: Chen, Szu-Jui, et al.
Published: (2026)

Robust fine-tuning of speech recognition models via model merging: application to disordered speech
by: Ducorroy, Alexandre, et al.
Published: (2025)

OnDA: On-device Channel Pruning for Efficient Personalized Keyword Spotting
by: Risso, Matteo, et al.
Published: (2026)

Perceptual implications of automatic anonymization in pathological speech
by: Arasteh, Soroosh Tayebi, et al.
Published: (2025)

How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
by: Mousavi, Pooneh, et al.
Published: (2024)

An automatic mixing speech enhancement system for multi-track audio
by: Liu, Xiaojing, et al.
Published: (2024)