Saved in:
| Main Authors: | Liu, Joseph, Hirschkind, Nameer, Yu, Xiao, Nandwana, Mahesh Kumar |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.09916 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation
by: Hirschkind, Nameer, et al.
Published: (2025)
by: Hirschkind, Nameer, et al.
Published: (2025)
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation
by: Hirschkind, Nameer, et al.
Published: (2024)
by: Hirschkind, Nameer, et al.
Published: (2024)
Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment
by: Liu, Joseph, et al.
Published: (2024)
by: Liu, Joseph, et al.
Published: (2024)
SimulTron: On-Device Simultaneous Speech to Speech Translation
by: Agranovich, Alex, et al.
Published: (2024)
by: Agranovich, Alex, et al.
Published: (2024)
Keyword-Guided Adaptation of Automatic Speech Recognition
by: Shamsian, Aviv, et al.
Published: (2024)
by: Shamsian, Aviv, et al.
Published: (2024)
AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation
by: Papi, Sara, et al.
Published: (2023)
by: Papi, Sara, et al.
Published: (2023)
Robust Unsupervised Adaptation of a Speech Recogniser Using Entropy Minimisation and Speaker Codes
by: van Dalen, Rogier C., et al.
Published: (2025)
by: van Dalen, Rogier C., et al.
Published: (2025)
RosettaSpeech: Zero-Shot Speech-to-Speech Translation without Parallel Speech
by: Zheng, Zhisheng, et al.
Published: (2025)
by: Zheng, Zhisheng, et al.
Published: (2025)
EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition
by: Thimonier, Hugo, et al.
Published: (2025)
by: Thimonier, Hugo, et al.
Published: (2025)
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling
by: Kalkhorani, Vahid Ahmadi, et al.
Published: (2024)
by: Kalkhorani, Vahid Ahmadi, et al.
Published: (2024)
Test-Time Adaptation for Speech Emotion Recognition
by: Dong, Jiaheng, et al.
Published: (2026)
by: Dong, Jiaheng, et al.
Published: (2026)
Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System
by: Khorrami, Khazar, et al.
Published: (2023)
by: Khorrami, Khazar, et al.
Published: (2023)
Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics
by: Kumar, Kabir
Published: (2025)
by: Kumar, Kabir
Published: (2025)
SpeechOp: Inference-Time Task Composition for Generative Speech Processing
by: Lovelace, Justin, et al.
Published: (2025)
by: Lovelace, Justin, et al.
Published: (2025)
Rethinking Entropy Minimization in Test-Time Adaptation for Autoregressive Models
by: Huang, Wei-Ping, et al.
Published: (2026)
by: Huang, Wei-Ping, et al.
Published: (2026)
Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation
by: Lashkarashvili, Nineli, et al.
Published: (2024)
by: Lashkarashvili, Nineli, et al.
Published: (2024)
Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes
by: Glazer, Neta, et al.
Published: (2025)
by: Glazer, Neta, et al.
Published: (2025)
Bayesian Learning for Deep Neural Network Adaptation
by: Xie, Xurong, et al.
Published: (2020)
by: Xie, Xurong, et al.
Published: (2020)
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
by: Duret, Jarod, et al.
Published: (2024)
by: Duret, Jarod, et al.
Published: (2024)
Prompt Amplification and Zero-Shot Late Fusion in Audio-Language Models for Speech Emotion Recognition
by: Kataria, Saurabh, et al.
Published: (2026)
by: Kataria, Saurabh, et al.
Published: (2026)
S2ST-Omni: Hierarchical Language-Aware SpeechLLM Adaptation for Multilingual Speech-to-Speech Translation
by: Pan, Yu, et al.
Published: (2025)
by: Pan, Yu, et al.
Published: (2025)
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control
by: Ji, Shengpeng, et al.
Published: (2024)
by: Ji, Shengpeng, et al.
Published: (2024)
Translatotron 3: Speech to Speech Translation with Monolingual Data
by: Nachmani, Eliya, et al.
Published: (2023)
by: Nachmani, Eliya, et al.
Published: (2023)
E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models
by: Dong, Jiaheng, et al.
Published: (2025)
by: Dong, Jiaheng, et al.
Published: (2025)
Improving Speaker-independent Speech Emotion Recognition Using Dynamic Joint Distribution Adaptation
by: Lu, Cheng, et al.
Published: (2024)
by: Lu, Cheng, et al.
Published: (2024)
In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word Level Timestamp Predictions
by: Fan, Xulin, et al.
Published: (2026)
by: Fan, Xulin, et al.
Published: (2026)
Dynamic Gated Recurrent Neural Network for Compute-efficient Speech Enhancement
by: Cheng, Longbiao, et al.
Published: (2024)
by: Cheng, Longbiao, et al.
Published: (2024)
Speech Diarization and ASR with GMM
by: Sharma, Aayush Kumar, et al.
Published: (2023)
by: Sharma, Aayush Kumar, et al.
Published: (2023)
Navigating the Minefield of MT Beam Search in Cascaded Streaming Speech Translation
by: Rabatin, Rastislav, et al.
Published: (2024)
by: Rabatin, Rastislav, et al.
Published: (2024)
Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026)
by: Wang, Chien-Chun, et al.
Published: (2026)
Drax: Speech Recognition with Discrete Flow Matching
by: Navon, Aviv, et al.
Published: (2025)
by: Navon, Aviv, et al.
Published: (2025)
TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
by: Chen, Chengxin, et al.
Published: (2024)
by: Chen, Chengxin, et al.
Published: (2024)
Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments
by: Cheng, Longbiao, et al.
Published: (2026)
by: Cheng, Longbiao, et al.
Published: (2026)
AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition
by: Eom, SooHwan, et al.
Published: (2024)
by: Eom, SooHwan, et al.
Published: (2024)
Regularizing Learnable Feature Extraction for Automatic Speech Recognition
by: Vieting, Peter, et al.
Published: (2025)
by: Vieting, Peter, et al.
Published: (2025)
Quantifying Quanvolutional Neural Networks Robustness for Speech in Healthcare Applications
by: Tran, Ha, et al.
Published: (2026)
by: Tran, Ha, et al.
Published: (2026)
Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw
by: Chorowski, Jan, et al.
Published: (2021)
by: Chorowski, Jan, et al.
Published: (2021)
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
by: Shankar, Bhavani, et al.
Published: (2024)
by: Shankar, Bhavani, et al.
Published: (2024)
DDTSE: Discriminative Diffusion Model for Target Speech Extraction
by: Zhang, Leying, et al.
Published: (2023)
by: Zhang, Leying, et al.
Published: (2023)
Reverse-Speech-Finder: A Neural Network Backtracking Architecture for Generating Alzheimer's Disease Speech Samples and Improving Diagnosis Performance
by: Li, Victor OK, et al.
Published: (2025)
by: Li, Victor OK, et al.
Published: (2025)
Similar Items
-
REINA: Regularized Entropy Information-Based Loss for Efficient Simultaneous Speech Translation
by: Hirschkind, Nameer, et al.
Published: (2025) -
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation
by: Hirschkind, Nameer, et al.
Published: (2024) -
Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment
by: Liu, Joseph, et al.
Published: (2024) -
SimulTron: On-Device Simultaneous Speech to Speech Translation
by: Agranovich, Alex, et al.
Published: (2024) -
Keyword-Guided Adaptation of Automatic Speech Recognition
by: Shamsian, Aviv, et al.
Published: (2024)