Saved in:
| Main Authors: | Fathullah, Yassir, Wu, Chunyang, Lakomkin, Egor, Li, Ke, Jia, Junteng, Shangguan, Yuan, Mahadeokar, Jay, Kalinli, Ozlem, Fuegen, Christian, Seltzer, Mike |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2311.06753 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Efficient Streaming LLM for Speech Recognition
by: Jia, Junteng, et al.
Published: (2024)
by: Jia, Junteng, et al.
Published: (2024)
Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
by: Kang, Wonjune, et al.
Published: (2024)
by: Kang, Wonjune, et al.
Published: (2024)
Faster Speech-LLaMA Inference with Multi-token Prediction
by: Raj, Desh, et al.
Published: (2024)
by: Raj, Desh, et al.
Published: (2024)
CJST: CTC Compressor based Joint Speech and Text Training for Decoder-Only ASR
by: Zhou, Wei, et al.
Published: (2024)
by: Zhou, Wei, et al.
Published: (2024)
Effective internal language model training and fusion for factorized transducer model
by: Guo, Jinxi, et al.
Published: (2024)
by: Guo, Jinxi, et al.
Published: (2024)
Can Speech LLMs Think while Listening?
by: Shih, Yi-Jen, et al.
Published: (2025)
by: Shih, Yi-Jen, et al.
Published: (2025)
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
by: Yang, Yufeng, et al.
Published: (2024)
by: Yang, Yufeng, et al.
Published: (2024)
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
by: Xie, Jiamin, et al.
Published: (2023)
by: Xie, Jiamin, et al.
Published: (2023)
Transducer-Llama: Integrating LLMs into Streamable Transducer-based Speech Recognition
by: Deng, Keqi, et al.
Published: (2024)
by: Deng, Keqi, et al.
Published: (2024)
Towards measuring fairness in speech recognition: Fair-Speech dataset
by: Veliche, Irina-Elena, et al.
Published: (2024)
by: Veliche, Irina-Elena, et al.
Published: (2024)
Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens
by: Zhao, Jinzheng, et al.
Published: (2024)
by: Zhao, Jinzheng, et al.
Published: (2024)
MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables
by: Yeh, Sung-Lin, et al.
Published: (2026)
by: Yeh, Sung-Lin, et al.
Published: (2026)
Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge
by: Fathullah, Yassir, et al.
Published: (2025)
by: Fathullah, Yassir, et al.
Published: (2025)
Efficient Sample-Specific Encoder Perturbations
by: Fathullah, Yassir, et al.
Published: (2024)
by: Fathullah, Yassir, et al.
Published: (2024)
Cross-Lingual Transfer Learning for Speech Translation
by: Ma, Rao, et al.
Published: (2024)
by: Ma, Rao, et al.
Published: (2024)
Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time
by: Seide, Frank, et al.
Published: (2024)
by: Seide, Frank, et al.
Published: (2024)
Teacher-Student Training for Debiasing: General Permutation Debiasing for Large Language Models
by: Liusie, Adian, et al.
Published: (2024)
by: Liusie, Adian, et al.
Published: (2024)
Efficient LLM Comparative Assessment: a Product of Experts Framework for Pairwise Comparisons
by: Liusie, Adian, et al.
Published: (2024)
by: Liusie, Adian, et al.
Published: (2024)
Effective Text Adaptation for LLM-based ASR through Soft Prompt Fine-Tuning
by: Ma, Yingyi, et al.
Published: (2024)
by: Ma, Yingyi, et al.
Published: (2024)
Token-Weighted RNN-T for Learning from Flawed Data
by: Keren, Gil, et al.
Published: (2024)
by: Keren, Gil, et al.
Published: (2024)
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts
by: Wang, Ruida, et al.
Published: (2024)
by: Wang, Ruida, et al.
Published: (2024)
VowelPrompt: Hearing Speech Emotions from Text via Vowel-level Prosodic Augmentation
by: Wang, Yancheng, et al.
Published: (2026)
by: Wang, Yancheng, et al.
Published: (2026)
Towards scalable efficient on-device ASR with transfer learning
by: Pandey, Laxmi, et al.
Published: (2024)
by: Pandey, Laxmi, et al.
Published: (2024)
Conversational Speech Naturalness Predictor
by: Xu, Anfeng, et al.
Published: (2026)
by: Xu, Anfeng, et al.
Published: (2026)
AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
by: Lin, Ju, et al.
Published: (2024)
by: Lin, Ju, et al.
Published: (2024)
Transcribing and Translating, Fast and Slow: Joint Speech Translation and Recognition
by: Moritz, Niko, et al.
Published: (2024)
by: Moritz, Niko, et al.
Published: (2024)
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
by: Yang, Xiaoyu, et al.
Published: (2024)
by: Yang, Xiaoyu, et al.
Published: (2024)
Hello-Chat: Towards Realistic Social Audio Interactions
by: Hou, Yueran, et al.
Published: (2026)
by: Hou, Yueran, et al.
Published: (2026)
Prompting Large Language Models with Audio for General-Purpose Speech Summarization
by: Kang, Wonjune, et al.
Published: (2024)
by: Kang, Wonjune, et al.
Published: (2024)
BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B
by: Gade, Pranav, et al.
Published: (2023)
by: Gade, Pranav, et al.
Published: (2023)
Towards Audio Codec-based Speech Separation
by: Yip, Jia Qi, et al.
Published: (2024)
by: Yip, Jia Qi, et al.
Published: (2024)
Wasserstein Gradient Flows for Scalable and Regularized Barycenter Computation
by: Montesuma, Eduardo Fernandes, et al.
Published: (2025)
by: Montesuma, Eduardo Fernandes, et al.
Published: (2025)
Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models
by: Petruzzellis, Flavio, et al.
Published: (2024)
by: Petruzzellis, Flavio, et al.
Published: (2024)
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
by: Zhang, Wenyu, et al.
Published: (2025)
by: Zhang, Wenyu, et al.
Published: (2025)
Developing Virtual Classroom Tours With Preservice Teachers: Integrating a Translanguaging Stance and Design
by: Kate Seltzer
Published: (2025)
by: Kate Seltzer
Published: (2025)
Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs
by: Niu, Jingcheng, et al.
Published: (2025)
by: Niu, Jingcheng, et al.
Published: (2025)
Investigating Bias Representations in Llama 2 Chat via Activation Steering
by: Lu, Dawn, et al.
Published: (2024)
by: Lu, Dawn, et al.
Published: (2024)
Towards Explicit Acoustic Evidence Perception in Audio LLMs for Speech Deepfake Detection
by: Guo, Xiaoxuan, et al.
Published: (2026)
by: Guo, Xiaoxuan, et al.
Published: (2026)
On the Universal Truthfulness Hyperplane Inside LLMs
by: Liu, Junteng, et al.
Published: (2024)
by: Liu, Junteng, et al.
Published: (2024)
Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation
by: Shen, Maohao, et al.
Published: (2024)
by: Shen, Maohao, et al.
Published: (2024)
Similar Items
-
Efficient Streaming LLM for Speech Recognition
by: Jia, Junteng, et al.
Published: (2024) -
Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
by: Kang, Wonjune, et al.
Published: (2024) -
Faster Speech-LLaMA Inference with Multi-token Prediction
by: Raj, Desh, et al.
Published: (2024) -
CJST: CTC Compressor based Joint Speech and Text Training for Decoder-Only ASR
by: Zhou, Wei, et al.
Published: (2024) -
Effective internal language model training and fusion for factorized transducer model
by: Guo, Jinxi, et al.
Published: (2024)