Saved in:
| Main Authors: | Huang, Ruizhe, Zhang, Xiaohui, Ni, Zhaoheng, Sun, Li, Hira, Moto, Hwang, Jeff, Manohar, Vimal, Pratap, Vineel, Wiesner, Matthew, Watanabe, Shinji, Povey, Daniel, Khudanpur, Sanjeev |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.02560 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation
by: Huang, Ruizhe, et al.
Published: (2024)
by: Huang, Ruizhe, et al.
Published: (2024)
Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System
by: Manohar, Vimal, et al.
Published: (2024)
by: Manohar, Vimal, et al.
Published: (2024)
Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
by: Yan, Brian, et al.
Published: (2024)
by: Yan, Brian, et al.
Published: (2024)
HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation
by: Hussein, Amir, et al.
Published: (2025)
by: Hussein, Amir, et al.
Published: (2025)
On Speaker Attribution with SURT
by: Raj, Desh, et al.
Published: (2024)
by: Raj, Desh, et al.
Published: (2024)
WST: Weakly Supervised Transducer for Automatic Speech Recognition
by: Gao, Dongji, et al.
Published: (2025)
by: Gao, Dongji, et al.
Published: (2025)
Can LLMs Help Localize Fake Words in Partially Fake Speech?
by: Zhang, Lin, et al.
Published: (2026)
by: Zhang, Lin, et al.
Published: (2026)
LV-CTC: Non-autoregressive ASR with CTC and latent variable models
by: Fujita, Yuya, et al.
Published: (2024)
by: Fujita, Yuya, et al.
Published: (2024)
Building Corpora for Single-Channel Speech Separation Across Multiple Domains
by: Maciejewski, Matthew, et al.
Published: (2018)
by: Maciejewski, Matthew, et al.
Published: (2018)
Scaling A Simple Approach to Zero-Shot Speech Recognition
by: Zhao, Jinming, et al.
Published: (2024)
by: Zhao, Jinming, et al.
Published: (2024)
Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection
by: Yang, Tzu-Ting, et al.
Published: (2024)
by: Yang, Tzu-Ting, et al.
Published: (2024)
Less is More: Data Curation Matters in Scaling Speech Enhancement
by: Li, Chenda, et al.
Published: (2025)
by: Li, Chenda, et al.
Published: (2025)
Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
by: Shao, Yiwen, et al.
Published: (2024)
by: Shao, Yiwen, et al.
Published: (2024)
Unsupervised Speech Enhancement using Data-defined Priors
by: Klement, Dominik, et al.
Published: (2025)
by: Klement, Dominik, et al.
Published: (2025)
CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024)
by: Yao, Zengwei, et al.
Published: (2024)
Target Speaker ASR with Whisper
by: Polok, Alexander, et al.
Published: (2024)
by: Polok, Alexander, et al.
Published: (2024)
Modeling Overlapped Speech with Shuffles
by: Wiesner, Matthew, et al.
Published: (2026)
by: Wiesner, Matthew, et al.
Published: (2026)
SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper
by: Polok, Alexander, et al.
Published: (2026)
by: Polok, Alexander, et al.
Published: (2026)
Effects of Speaker Count, Duration, and Accent Diversity on Zero-Shot Accent Robustness in Low-Resource ASR
by: Yong, Zheng-Xin, et al.
Published: (2025)
by: Yong, Zheng-Xin, et al.
Published: (2025)
Clean Label Attacks against SLU Systems
by: Xinyuan, Henry Li, et al.
Published: (2024)
by: Xinyuan, Henry Li, et al.
Published: (2024)
Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation
by: Tsunoo, Emiru, et al.
Published: (2023)
by: Tsunoo, Emiru, et al.
Published: (2023)
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
by: Peng, Yifan, et al.
Published: (2024)
by: Peng, Yifan, et al.
Published: (2024)
Associativity-Peakiness Metric for Contingency Tables
by: Zirkind, Naomi E., et al.
Published: (2026)
by: Zirkind, Naomi E., et al.
Published: (2026)
Joint Beam Search Integrating CTC, Attention, and Transducer Decoders
by: Sudo, Yui, et al.
Published: (2024)
by: Sudo, Yui, et al.
Published: (2024)
Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC
by: Wang, Qingzheng, et al.
Published: (2025)
by: Wang, Qingzheng, et al.
Published: (2025)
FeruzaSpeech: A 60 Hour Uzbek Read Speech Corpus with Punctuation, Casing, and Context
by: Povey, Anna, et al.
Published: (2024)
by: Povey, Anna, et al.
Published: (2024)
SpatialEmb: Extract and Encode Spatial Information for 1-Stage Multi-channel Multi-speaker ASR on Arbitrary Microphone Arrays
by: Shao, Yiwen, et al.
Published: (2026)
by: Shao, Yiwen, et al.
Published: (2026)
The Paradox Of Just-in-Time Liquidity in Decentralized Exchanges: More Providers Can Sometimes Mean Less Liquidity
by: Capponi, Agostino, et al.
Published: (2023)
by: Capponi, Agostino, et al.
Published: (2023)
kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
by: Zhou, Jiaming, et al.
Published: (2023)
by: Zhou, Jiaming, et al.
Published: (2023)
The Gauss-Markov Adjunction Provides Categorical Semantics of Residuals in Supervised Learning
by: Kamiura, Moto
Published: (2025)
by: Kamiura, Moto
Published: (2025)
Autonomous Agents and Policy Compliance: A Framework for Reasoning About Penalties
by: Tummala, Vineel, et al.
Published: (2025)
by: Tummala, Vineel, et al.
Published: (2025)
Escape the Language Prior: Mitigating Late-Stage Modality Collapse in Audio Reasoning via Modality-Aware Policy Optimization
by: Xiao, Cihan, et al.
Published: (2026)
by: Xiao, Cihan, et al.
Published: (2026)
Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement
by: Zhang, Xueyao, et al.
Published: (2025)
by: Zhang, Xueyao, et al.
Published: (2025)
GenVC: Self-Supervised Zero-Shot Voice Conversion
by: Cai, Zexin, et al.
Published: (2025)
by: Cai, Zexin, et al.
Published: (2025)
Rapidly Adapting to New Voice Spoofing: Few-Shot Detection of Synthesized Speech Under Distribution Shifts
by: Garg, Ashi, et al.
Published: (2025)
by: Garg, Ashi, et al.
Published: (2025)
Scalable Controllable Accented TTS
by: Xinyuan, Henry Li, et al.
Published: (2025)
by: Xinyuan, Henry Li, et al.
Published: (2025)
Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization
by: Cai, Zexin, et al.
Published: (2024)
by: Cai, Zexin, et al.
Published: (2024)
HLTCOE JHU Submission to the Voice Privacy Challenge 2024
by: Xinyuan, Henry Li, et al.
Published: (2024)
by: Xinyuan, Henry Li, et al.
Published: (2024)
Universal Speech Content Factorization
by: Xinyuan, Henry Li, et al.
Published: (2026)
by: Xinyuan, Henry Li, et al.
Published: (2026)
Dimensionality-Aware Anomaly Detection in Learned Representations of Self-Supervised Speech Models
by: Arcos-Holzinger, Sandra, et al.
Published: (2026)
by: Arcos-Holzinger, Sandra, et al.
Published: (2026)
Similar Items
-
Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation
by: Huang, Ruizhe, et al.
Published: (2024) -
Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System
by: Manohar, Vimal, et al.
Published: (2024) -
Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
by: Yan, Brian, et al.
Published: (2024) -
HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation
by: Hussein, Amir, et al.
Published: (2025) -
On Speaker Attribution with SURT
by: Raj, Desh, et al.
Published: (2024)