Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Male, Prabash Reddy, Ray, Swayambhu Nath, Arsikere, Harish, Jaiswal, Akshat, Swarup, Prakhar, Sen, Prantik, Chakrabarty, Debmalya, Girish, K V Vijay, Bhave, Nikhil, Weber, Frederick, Bhattacharya, Sambuddha, Garimella, Sri
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2505.19774
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Recent advancements in speech encoders have drawn attention due to their integration with Large Language Models for various speech tasks. While most research has focused on either causal or full-context speech encoders, there's limited exploration to effectively handle both streaming and non-streaming applications, while achieving state-of-the-art performance. We introduce DuRep, a Dual-mode Speech Representation learning setup, which enables a single speech encoder to function efficiently in both offline and online modes without additional parameters or mode-specific adjustments, across downstream tasks. DuRep-200M, our 200M parameter dual-mode encoder, achieves 12% and 11.6% improvements in streaming and non-streaming modes, over baseline encoders on Multilingual ASR. Scaling this approach to 2B parameters, DuRep-2B sets new performance benchmarks across ASR and non-ASR tasks. Our analysis reveals interesting trade-offs between acoustic and semantic information across encoder layers.

Similar Items