:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Fathullah, Yassir, Wu, Chunyang, Lakomkin, Egor, Li, Ke, Jia, Junteng, Shangguan, Yuan, Mahadeokar, Jay, Kalinli, Ozlem, Fuegen, Christian, Seltzer, Mike
Format:	Preprint
Published:	2023
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2311.06753
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Efficient Streaming LLM for Speech Recognition
by: Jia, Junteng, et al.
Published: (2024)

Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
by: Kang, Wonjune, et al.
Published: (2024)

Faster Speech-LLaMA Inference with Multi-token Prediction
by: Raj, Desh, et al.
Published: (2024)

CJST: CTC Compressor based Joint Speech and Text Training for Decoder-Only ASR
by: Zhou, Wei, et al.
Published: (2024)

Effective internal language model training and fusion for factorized transducer model
by: Guo, Jinxi, et al.
Published: (2024)

Can Speech LLMs Think while Listening?
by: Shih, Yi-Jen, et al.
Published: (2025)

M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
by: Yang, Yufeng, et al.
Published: (2024)

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model
by: Xie, Jiamin, et al.
Published: (2023)

Transducer-Llama: Integrating LLMs into Streamable Transducer-based Speech Recognition
by: Deng, Keqi, et al.
Published: (2024)

Towards measuring fairness in speech recognition: Fair-Speech dataset
by: Veliche, Irina-Elena, et al.
Published: (2024)

Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens
by: Zhao, Jinzheng, et al.
Published: (2024)

MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables
by: Yeh, Sung-Lin, et al.
Published: (2026)

Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge
by: Fathullah, Yassir, et al.
Published: (2025)

Efficient Sample-Specific Encoder Perturbations
by: Fathullah, Yassir, et al.
Published: (2024)

Cross-Lingual Transfer Learning for Speech Translation
by: Ma, Rao, et al.
Published: (2024)

Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time
by: Seide, Frank, et al.
Published: (2024)

Teacher-Student Training for Debiasing: General Permutation Debiasing for Large Language Models
by: Liusie, Adian, et al.
Published: (2024)

Efficient LLM Comparative Assessment: a Product of Experts Framework for Pairwise Comparisons
by: Liusie, Adian, et al.
Published: (2024)

Effective Text Adaptation for LLM-based ASR through Soft Prompt Fine-Tuning
by: Ma, Yingyi, et al.
Published: (2024)

Token-Weighted RNN-T for Learning from Flawed Data
by: Keren, Gil, et al.
Published: (2024)

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts
by: Wang, Ruida, et al.
Published: (2024)

VowelPrompt: Hearing Speech Emotions from Text via Vowel-level Prosodic Augmentation
by: Wang, Yancheng, et al.
Published: (2026)

Towards scalable efficient on-device ASR with transfer learning
by: Pandey, Laxmi, et al.
Published: (2024)

Conversational Speech Naturalness Predictor
by: Xu, Anfeng, et al.
Published: (2026)

AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
by: Lin, Ju, et al.
Published: (2024)

Transcribing and Translating, Fast and Slow: Joint Speech Translation and Recognition
by: Moritz, Niko, et al.
Published: (2024)

MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
by: Yang, Xiaoyu, et al.
Published: (2024)

Hello-Chat: Towards Realistic Social Audio Interactions
by: Hou, Yueran, et al.
Published: (2026)

Prompting Large Language Models with Audio for General-Purpose Speech Summarization
by: Kang, Wonjune, et al.
Published: (2024)

BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B
by: Gade, Pranav, et al.
Published: (2023)

Towards Audio Codec-based Speech Separation
by: Yip, Jia Qi, et al.
Published: (2024)

Wasserstein Gradient Flows for Scalable and Regularized Barycenter Computation
by: Montesuma, Eduardo Fernandes, et al.
Published: (2025)

Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models
by: Petruzzellis, Flavio, et al.
Published: (2024)

Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
by: Zhang, Wenyu, et al.
Published: (2025)

Developing Virtual Classroom Tours With Preservice Teachers: Integrating a Translanguaging Stance and Design
by: Kate Seltzer
Published: (2025)

Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs
by: Niu, Jingcheng, et al.
Published: (2025)

Investigating Bias Representations in Llama 2 Chat via Activation Steering
by: Lu, Dawn, et al.
Published: (2024)

Towards Explicit Acoustic Evidence Perception in Audio LLMs for Speech Deepfake Detection
by: Guo, Xiaoxuan, et al.
Published: (2026)

On the Universal Truthfulness Hyperplane Inside LLMs
by: Liu, Junteng, et al.
Published: (2024)

Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation
by: Shen, Maohao, et al.
Published: (2024)