:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Joseph, Hirschkind, Nameer, Yu, Xiao, Nandwana, Mahesh Kumar
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2604.09916
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation
by: Hirschkind, Nameer, et al.
Published: (2025)

Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation
by: Hirschkind, Nameer, et al.
Published: (2024)

Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment
by: Liu, Joseph, et al.
Published: (2024)

SimulTron: On-Device Simultaneous Speech to Speech Translation
by: Agranovich, Alex, et al.
Published: (2024)

Keyword-Guided Adaptation of Automatic Speech Recognition
by: Shamsian, Aviv, et al.
Published: (2024)

AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation
by: Papi, Sara, et al.
Published: (2023)

Robust Unsupervised Adaptation of a Speech Recogniser Using Entropy Minimisation and Speaker Codes
by: van Dalen, Rogier C., et al.
Published: (2025)

RosettaSpeech: Zero-Shot Speech-to-Speech Translation without Parallel Speech
by: Zheng, Zhisheng, et al.
Published: (2025)

EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition
by: Thimonier, Hugo, et al.
Published: (2025)

AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling
by: Kalkhorani, Vahid Ahmadi, et al.
Published: (2024)

Test-Time Adaptation for Speech Emotion Recognition
by: Dong, Jiaheng, et al.
Published: (2026)

Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System
by: Khorrami, Khazar, et al.
Published: (2023)

Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics
by: Kumar, Kabir
Published: (2025)

SpeechOp: Inference-Time Task Composition for Generative Speech Processing
by: Lovelace, Justin, et al.
Published: (2025)

Rethinking Entropy Minimization in Test-Time Adaptation for Autoregressive Models
by: Huang, Wei-Ping, et al.
Published: (2026)

Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation
by: Lashkarashvili, Nineli, et al.
Published: (2024)

Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes
by: Glazer, Neta, et al.
Published: (2025)

Bayesian Learning for Deep Neural Network Adaptation
by: Xie, Xurong, et al.
Published: (2020)

Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
by: Duret, Jarod, et al.
Published: (2024)

Prompt Amplification and Zero-Shot Late Fusion in Audio-Language Models for Speech Emotion Recognition
by: Kataria, Saurabh, et al.
Published: (2026)

S2ST-Omni: Hierarchical Language-Aware SpeechLLM Adaptation for Multilingual Speech-to-Speech Translation
by: Pan, Yu, et al.
Published: (2025)

ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
by: Ji, Shengpeng, et al.
Published: (2024)

Translatotron 3: Speech to Speech Translation with Monolingual Data
by: Nachmani, Eliya, et al.
Published: (2023)

E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models
by: Dong, Jiaheng, et al.
Published: (2025)

Improving Speaker-independent Speech Emotion Recognition Using Dynamic Joint Distribution Adaptation
by: Lu, Cheng, et al.
Published: (2024)

In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word Level Timestamp Predictions
by: Fan, Xulin, et al.
Published: (2026)

Dynamic Gated Recurrent Neural Network for Compute-efficient Speech Enhancement
by: Cheng, Longbiao, et al.
Published: (2024)

Speech Diarization and ASR with GMM
by: Sharma, Aayush Kumar, et al.
Published: (2023)

Navigating the Minefield of MT Beam Search in Cascaded Streaming Speech Translation
by: Rabatin, Rastislav, et al.
Published: (2024)

Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026)

Drax: Speech Recognition with Discrete Flow Matching
by: Navon, Aviv, et al.
Published: (2025)

TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
by: Chen, Chengxin, et al.
Published: (2024)

Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments
by: Cheng, Longbiao, et al.
Published: (2026)

AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition
by: Eom, SooHwan, et al.
Published: (2024)

Regularizing Learnable Feature Extraction for Automatic Speech Recognition
by: Vieting, Peter, et al.
Published: (2025)

Quantifying Quanvolutional Neural Networks Robustness for Speech in Healthcare Applications
by: Tran, Ha, et al.
Published: (2026)

Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw
by: Chorowski, Jan, et al.
Published: (2021)

CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
by: Shankar, Bhavani, et al.
Published: (2024)

DDTSE: Discriminative Diffusion Model for Target Speech Extraction
by: Zhang, Leying, et al.
Published: (2023)

Reverse-Speech-Finder: A Neural Network Backtracking Architecture for Generating Alzheimer's Disease Speech Samples and Improving Diagnosis Performance
by: Li, Victor OK, et al.
Published: (2025)