Saved in:
| Main Authors: | Wright, George August, Cappellazzo, Umberto, Zaiem, Salah, Raj, Desh, Yang, Lucas Ondel, Falavigna, Daniele, Ali, Mohamed Nabih, Brutti, Alessio |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2309.09546 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices
by: Lasbordes, Maxence, et al.
Published: (2025)
by: Lasbordes, Maxence, et al.
Published: (2025)
Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters
by: Cappellazzo, Umberto, et al.
Published: (2024)
by: Cappellazzo, Umberto, et al.
Published: (2024)
Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers
by: Cappellazzo, Umberto, et al.
Published: (2023)
by: Cappellazzo, Umberto, et al.
Published: (2023)
Continual Contrastive Spoken Language Understanding
by: Cappellazzo, Umberto, et al.
Published: (2023)
by: Cappellazzo, Umberto, et al.
Published: (2023)
Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach
by: Cappellazzo, Umberto, et al.
Published: (2025)
by: Cappellazzo, Umberto, et al.
Published: (2025)
Large Language Models are Strong Audio-Visual Speech Recognition Learners
by: Cappellazzo, Umberto, et al.
Published: (2024)
by: Cappellazzo, Umberto, et al.
Published: (2024)
Input Conditioned Layer Dropping in Speech Foundation Models
by: Hannan, Abdul, et al.
Published: (2025)
by: Hannan, Abdul, et al.
Published: (2025)
MLMA: Towards Multilingual ASR With Mamba-based Architectures
by: Ali, Mohamed Nabih, et al.
Published: (2025)
by: Ali, Mohamed Nabih, et al.
Published: (2025)
Federating Dynamic Models using Early-Exit Architectures for Automatic Speech Recognition on Heterogeneous Clients
by: Ali, Mohamed Nabih, et al.
Published: (2024)
by: Ali, Mohamed Nabih, et al.
Published: (2024)
Listening to Multi-talker Conversations: Modular and End-to-end Perspectives
by: Raj, Desh
Published: (2024)
by: Raj, Desh
Published: (2024)
Prominence-aware automatic speech recognition for conversational speech
by: Linke, Julian, et al.
Published: (2025)
by: Linke, Julian, et al.
Published: (2025)
Less Forgetting for Better Generalization: Exploring Continual-learning Fine-tuning Methods for Speech Self-supervised Representations
by: Zaiem, Salah, et al.
Published: (2024)
by: Zaiem, Salah, et al.
Published: (2024)
Evaluating and Improving Continual Learning in Spoken Language Understanding
by: Yang, Muqiao, et al.
Published: (2024)
by: Yang, Muqiao, et al.
Published: (2024)
Zipformer: A faster and better encoder for automatic speech recognition
by: Yao, Zengwei, et al.
Published: (2023)
by: Yao, Zengwei, et al.
Published: (2023)
Robustifying automatic speech recognition by extracting slowly varying features
by: Pizarro, Matías, et al.
Published: (2021)
by: Pizarro, Matías, et al.
Published: (2021)
Dr. SHAP-AV: Decoding Relative Modality Contributions via Shapley Attribution in Audio-Visual Speech Recognition
by: Cappellazzo, Umberto, et al.
Published: (2026)
by: Cappellazzo, Umberto, et al.
Published: (2026)
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs
by: Cappellazzo, Umberto, et al.
Published: (2025)
by: Cappellazzo, Umberto, et al.
Published: (2025)
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
by: Gong, Rong, et al.
Published: (2024)
by: Gong, Rong, et al.
Published: (2024)
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
by: Gaido, Marco, et al.
Published: (2024)
by: Gaido, Marco, et al.
Published: (2024)
Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMs
by: Anand, et al.
Published: (2025)
by: Anand, et al.
Published: (2025)
Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition
by: Zhang, Yiru, et al.
Published: (2025)
by: Zhang, Yiru, et al.
Published: (2025)
The evaluation of a code-switched Sepedi-English automatic speech recognition system
by: Phaladi, Amanda, et al.
Published: (2024)
by: Phaladi, Amanda, et al.
Published: (2024)
SeMaScore : a new evaluation metric for automatic speech recognition tasks
by: Sasindran, Zitha, et al.
Published: (2024)
by: Sasindran, Zitha, et al.
Published: (2024)
Automated speech audiometry: Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?
by: Araiza-Illan, Gloria, et al.
Published: (2023)
by: Araiza-Illan, Gloria, et al.
Published: (2023)
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
by: Zaiem, Salah, et al.
Published: (2023)
by: Zaiem, Salah, et al.
Published: (2023)
Faster Speech-LLaMA Inference with Multi-token Prediction
by: Raj, Desh, et al.
Published: (2024)
by: Raj, Desh, et al.
Published: (2024)
Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models
by: Cappellazzo, Umberto, et al.
Published: (2025)
by: Cappellazzo, Umberto, et al.
Published: (2025)
Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens
by: San, Nay, et al.
Published: (2024)
by: San, Nay, et al.
Published: (2024)
Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages
by: Fong, Seraphina, et al.
Published: (2025)
by: Fong, Seraphina, et al.
Published: (2025)
Introduction to speech recognition
by: Dauphin, Gabriel
Published: (2024)
by: Dauphin, Gabriel
Published: (2024)
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
by: Vecino, Biel Tura, et al.
Published: (2025)
by: Vecino, Biel Tura, et al.
Published: (2025)
Improving child speech recognition with augmented child-like speech
by: Zhang, Yuanyuan, et al.
Published: (2024)
by: Zhang, Yuanyuan, et al.
Published: (2024)
CognoSpeak: an automatic, remote assessment of early cognitive decline in real-world conversational speech
by: Pahar, Madhurananda, et al.
Published: (2025)
by: Pahar, Madhurananda, et al.
Published: (2025)
Low-resource speech recognition and dialect identification of Irish in a multi-task framework
by: Lonergan, Liam, et al.
Published: (2024)
by: Lonergan, Liam, et al.
Published: (2024)
Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus
by: Chen, Szu-Jui, et al.
Published: (2026)
by: Chen, Szu-Jui, et al.
Published: (2026)
Robust fine-tuning of speech recognition models via model merging: application to disordered speech
by: Ducorroy, Alexandre, et al.
Published: (2025)
by: Ducorroy, Alexandre, et al.
Published: (2025)
OnDA: On-device Channel Pruning for Efficient Personalized Keyword Spotting
by: Risso, Matteo, et al.
Published: (2026)
by: Risso, Matteo, et al.
Published: (2026)
Perceptual implications of automatic anonymization in pathological speech
by: Arasteh, Soroosh Tayebi, et al.
Published: (2025)
by: Arasteh, Soroosh Tayebi, et al.
Published: (2025)
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
by: Mousavi, Pooneh, et al.
Published: (2024)
by: Mousavi, Pooneh, et al.
Published: (2024)
An automatic mixing speech enhancement system for multi-track audio
by: Liu, Xiaojing, et al.
Published: (2024)
by: Liu, Xiaojing, et al.
Published: (2024)
Similar Items
-
Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices
by: Lasbordes, Maxence, et al.
Published: (2025) -
Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters
by: Cappellazzo, Umberto, et al.
Published: (2024) -
Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers
by: Cappellazzo, Umberto, et al.
Published: (2023) -
Continual Contrastive Spoken Language Understanding
by: Cappellazzo, Umberto, et al.
Published: (2023) -
Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach
by: Cappellazzo, Umberto, et al.
Published: (2025)