Guardado en:
Detalles Bibliográficos
Autores principales: Tan, Weiting, Chen, Yunmo, Chen, Tongfei, Qin, Guanghui, Xu, Haoran, Zhang, Heidi C., Van Durme, Benjamin, Koehn, Philipp
Formato: Preprint
Publicado: 2024
Materias:
Acceso en línea:https://arxiv.org/abs/2402.01172
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866913849102303232
author Tan, Weiting
Chen, Yunmo
Chen, Tongfei
Qin, Guanghui
Xu, Haoran
Zhang, Heidi C.
Van Durme, Benjamin
Koehn, Philipp
author_facet Tan, Weiting
Chen, Yunmo
Chen, Tongfei
Qin, Guanghui
Xu, Haoran
Zhang, Heidi C.
Van Durme, Benjamin
Koehn, Philipp
contents We introduce STAR (Stream Transduction with Anchor Representations), a novel Transformer-based model designed for efficient sequence-to-sequence transduction over streams. STAR dynamically segments input streams to create compressed anchor representations, achieving nearly lossless compression (12x) in Automatic Speech Recognition (ASR) and outperforming existing methods. Moreover, STAR demonstrates superior segmentation and latency-quality trade-offs in simultaneous speech-to-text tasks, optimizing latency, memory footprint, and quality.
format Preprint
id arxiv_https___arxiv_org_abs_2402_01172
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Streaming Sequence Transduction through Dynamic Compression
Tan, Weiting
Chen, Yunmo
Chen, Tongfei
Qin, Guanghui
Xu, Haoran
Zhang, Heidi C.
Van Durme, Benjamin
Koehn, Philipp
Computation and Language
Sound
Audio and Speech Processing
We introduce STAR (Stream Transduction with Anchor Representations), a novel Transformer-based model designed for efficient sequence-to-sequence transduction over streams. STAR dynamically segments input streams to create compressed anchor representations, achieving nearly lossless compression (12x) in Automatic Speech Recognition (ASR) and outperforming existing methods. Moreover, STAR demonstrates superior segmentation and latency-quality trade-offs in simultaneous speech-to-text tasks, optimizing latency, memory footprint, and quality.
title Streaming Sequence Transduction through Dynamic Compression
topic Computation and Language
Sound
Audio and Speech Processing
url https://arxiv.org/abs/2402.01172