Saved in:
| Main Authors: | Lin, Yen-Ting, Chen, Zhehuai, Zelasko, Piotr, Wan, Zhen, Yang, Xuesong, Chen, Zih-Ching, Puvvada, Krishna C, Fu, Szu-Wei, Hu, Ke, Chiu, Jun Wei, Balam, Jagadeesh, Ginsburg, Boris, Wang, Yu-Chiang Frank, Yang, Chao-Han Huck |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.05945 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
by: Chen, Zhehuai, et al.
Published: (2024)
by: Chen, Zhehuai, et al.
Published: (2024)
Chain-of-Thought Prompting for Speech Translation
by: Hu, Ke, et al.
Published: (2024)
by: Hu, Ke, et al.
Published: (2024)
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
by: Peng, Yifan, et al.
Published: (2024)
by: Peng, Yifan, et al.
Published: (2024)
Less is More: Accurate Speech Recognition & Translation without Web-Scale Data
by: Puvvada, Krishna C., et al.
Published: (2024)
by: Puvvada, Krishna C., et al.
Published: (2024)
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
by: Lu, Ke-Han, et al.
Published: (2024)
by: Lu, Ke-Han, et al.
Published: (2024)
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
by: Burchi, Maxime, et al.
Published: (2024)
by: Burchi, Maxime, et al.
Published: (2024)
SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
by: Hu, Ke, et al.
Published: (2025)
by: Hu, Ke, et al.
Published: (2025)
EMMeTT: Efficient Multimodal Machine Translation Training
by: Żelasko, Piotr, et al.
Published: (2024)
by: Żelasko, Piotr, et al.
Published: (2024)
Training and Inference Efficiency of Encoder-Decoder Speech Models
by: Żelasko, Piotr, et al.
Published: (2025)
by: Żelasko, Piotr, et al.
Published: (2025)
Word Level Timestamp Generation for Automatic Speech Recognition and Translation
by: Hu, Ke, et al.
Published: (2025)
by: Hu, Ke, et al.
Published: (2025)
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models
by: Noroozi, Vahid, et al.
Published: (2024)
by: Noroozi, Vahid, et al.
Published: (2024)
Anticipating Future with Large Language Model for Simultaneous Machine Translation
by: Ouyang, Siqi, et al.
Published: (2024)
by: Ouyang, Siqi, et al.
Published: (2024)
Canary-1B-v2 & Parakeet-TDT-0.6B-v3: Efficient and High-Performance Models for Multilingual ASR and AST
by: Sekoyan, Monica, et al.
Published: (2025)
by: Sekoyan, Monica, et al.
Published: (2025)
Flexible Multichannel Speech Enhancement for Noise-Robust Frontend
by: Jukić, Ante, et al.
Published: (2024)
by: Jukić, Ante, et al.
Published: (2024)
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
by: Yang, Chao-Han Huck, et al.
Published: (2024)
by: Yang, Chao-Han Huck, et al.
Published: (2024)
Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
by: Wang, Weiqing, et al.
Published: (2024)
by: Wang, Weiqing, et al.
Published: (2024)
Sortformer: A Novel Approach for Permutation-Resolved Speaker Supervision in Speech-to-Text Systems
by: Park, Taejin, et al.
Published: (2024)
by: Park, Taejin, et al.
Published: (2024)
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
by: Huang, He, et al.
Published: (2024)
by: Huang, He, et al.
Published: (2024)
Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
by: Huang, Sung-Feng, et al.
Published: (2025)
by: Huang, Sung-Feng, et al.
Published: (2025)
Schrödinger Bridge for Generative Speech Enhancement
by: Jukić, Ante, et al.
Published: (2024)
by: Jukić, Ante, et al.
Published: (2024)
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
by: Lu, Ke-Han, et al.
Published: (2024)
by: Lu, Ke-Han, et al.
Published: (2024)
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition
by: Noroozi, Vahid, et al.
Published: (2023)
by: Noroozi, Vahid, et al.
Published: (2023)
Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
by: Xu, Hainan, et al.
Published: (2024)
by: Xu, Hainan, et al.
Published: (2024)
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
by: Yang, Chih-Kai, et al.
Published: (2025)
by: Yang, Chih-Kai, et al.
Published: (2025)
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
by: Dhawan, Kunal, et al.
Published: (2024)
by: Dhawan, Kunal, et al.
Published: (2024)
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
by: Koluguri, Nithin Rao, et al.
Published: (2024)
by: Koluguri, Nithin Rao, et al.
Published: (2024)
Expanding the Utilization of Pharmacological Treatments for Alcohol Use Disorder: Reflections on a Swedish Nationwide Study
by: Szu‐Chieh Chiu, et al.
Published: (2025)
by: Szu‐Chieh Chiu, et al.
Published: (2025)
GetBatch: Distributed Multi-Object Retrieval for ML Data Loading
by: Aizman, Alex, et al.
Published: (2026)
by: Aizman, Alex, et al.
Published: (2026)
Streaming Sortformer: Speaker Cache-Based Online Speaker Diarization with Arrival-Time Ordering
by: Medennikov, Ivan, et al.
Published: (2025)
by: Medennikov, Ivan, et al.
Published: (2025)
Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations
by: Feng, Bo-Han, et al.
Published: (2025)
by: Feng, Bo-Han, et al.
Published: (2025)
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception
by: Wan, Zhen, et al.
Published: (2026)
by: Wan, Zhen, et al.
Published: (2026)
SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
by: Grossman, Raymond, et al.
Published: (2025)
by: Grossman, Raymond, et al.
Published: (2025)
Dynamic Latent Separation for Deep Learning
by: Tuan, Yi-Lin, et al.
Published: (2022)
by: Tuan, Yi-Lin, et al.
Published: (2022)
BuddyMoE: Exploiting Expert Redundancy to Accelerate Memory-Constrained Mixture-of-Experts Inference
by: Wang, Yun, et al.
Published: (2025)
by: Wang, Yun, et al.
Published: (2025)
Chunk-wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text
by: Xu, Hainan, et al.
Published: (2026)
by: Xu, Hainan, et al.
Published: (2026)
The First Drop of Ink: Nonlinear Impact of Misleading Information in Long-Context Reasoning
by: Gao, Muhan, et al.
Published: (2026)
by: Gao, Muhan, et al.
Published: (2026)
Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models
by: Majumdar, Somshubra, et al.
Published: (2024)
by: Majumdar, Somshubra, et al.
Published: (2024)
Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR
by: Wang, Weiqing, et al.
Published: (2025)
by: Wang, Weiqing, et al.
Published: (2025)
MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts
by: Zhao, Yushu, et al.
Published: (2025)
by: Zhao, Yushu, et al.
Published: (2025)
Similar Items
-
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
by: Chen, Zhehuai, et al.
Published: (2024) -
Chain-of-Thought Prompting for Speech Translation
by: Hu, Ke, et al.
Published: (2024) -
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
by: Peng, Yifan, et al.
Published: (2024) -
Less is More: Accurate Speech Recognition & Translation without Web-Scale Data
by: Puvvada, Krishna C., et al.
Published: (2024) -
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
by: Lu, Ke-Han, et al.
Published: (2024)