:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Huang, Ruizhe, Zhang, Xiaohui, Ni, Zhaoheng, Sun, Li, Hira, Moto, Hwang, Jeff, Manohar, Vimal, Pratap, Vineel, Wiesner, Matthew, Watanabe, Shinji, Povey, Daniel, Khudanpur, Sanjeev
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Artificial Intelligence Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2406.02560
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation
by: Huang, Ruizhe, et al.
Published: (2024)

Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System
by: Manohar, Vimal, et al.
Published: (2024)

Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking
by: Yan, Brian, et al.
Published: (2024)

HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation
by: Hussein, Amir, et al.
Published: (2025)

On Speaker Attribution with SURT
by: Raj, Desh, et al.
Published: (2024)

WST: Weakly Supervised Transducer for Automatic Speech Recognition
by: Gao, Dongji, et al.
Published: (2025)

Can LLMs Help Localize Fake Words in Partially Fake Speech?
by: Zhang, Lin, et al.
Published: (2026)

LV-CTC: Non-autoregressive ASR with CTC and latent variable models
by: Fujita, Yuya, et al.
Published: (2024)

Building Corpora for Single-Channel Speech Separation Across Multiple Domains
by: Maciejewski, Matthew, et al.
Published: (2018)

Scaling A Simple Approach to Zero-Shot Speech Recognition
by: Zhao, Jinming, et al.
Published: (2024)

Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection
by: Yang, Tzu-Ting, et al.
Published: (2024)

Less is More: Data Curation Matters in Scaling Speech Enhancement
by: Li, Chenda, et al.
Published: (2025)

Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
by: Shao, Yiwen, et al.
Published: (2024)

Unsupervised Speech Enhancement using Data-defined Priors
by: Klement, Dominik, et al.
Published: (2025)

CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024)

Target Speaker ASR with Whisper
by: Polok, Alexander, et al.
Published: (2024)

Modeling Overlapped Speech with Shuffles
by: Wiesner, Matthew, et al.
Published: (2026)

SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper
by: Polok, Alexander, et al.
Published: (2026)

Effects of Speaker Count, Duration, and Accent Diversity on Zero-Shot Accent Robustness in Low-Resource ASR
by: Yong, Zheng-Xin, et al.
Published: (2025)

Clean Label Attacks against SLU Systems
by: Xinyuan, Henry Li, et al.
Published: (2024)

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation
by: Tsunoo, Emiru, et al.
Published: (2023)

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
by: Peng, Yifan, et al.
Published: (2024)

Associativity-Peakiness Metric for Contingency Tables
by: Zirkind, Naomi E., et al.
Published: (2026)

Joint Beam Search Integrating CTC, Attention, and Transducer Decoders
by: Sudo, Yui, et al.
Published: (2024)

Improving Multilingual Speech Models on ML-SUPERB 2.0: Fine-tuning with Data Augmentation and LID-Aware CTC
by: Wang, Qingzheng, et al.
Published: (2025)

FeruzaSpeech: A 60 Hour Uzbek Read Speech Corpus with Punctuation, Casing, and Context
by: Povey, Anna, et al.
Published: (2024)

SpatialEmb: Extract and Encode Spatial Information for 1-Stage Multi-channel Multi-speaker ASR on Arbitrary Microphone Arrays
by: Shao, Yiwen, et al.
Published: (2026)

The Paradox Of Just-in-Time Liquidity in Decentralized Exchanges: More Providers Can Sometimes Mean Less Liquidity
by: Capponi, Agostino, et al.
Published: (2023)

kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
by: Zhou, Jiaming, et al.
Published: (2023)

The Gauss-Markov Adjunction Provides Categorical Semantics of Residuals in Supervised Learning
by: Kamiura, Moto
Published: (2025)

Autonomous Agents and Policy Compliance: A Framework for Reasoning About Penalties
by: Tummala, Vineel, et al.
Published: (2025)

Escape the Language Prior: Mitigating Late-Stage Modality Collapse in Audio Reasoning via Modality-Aware Policy Optimization
by: Xiao, Cihan, et al.
Published: (2026)

Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement
by: Zhang, Xueyao, et al.
Published: (2025)

GenVC: Self-Supervised Zero-Shot Voice Conversion
by: Cai, Zexin, et al.
Published: (2025)

Rapidly Adapting to New Voice Spoofing: Few-Shot Detection of Synthesized Speech Under Distribution Shifts
by: Garg, Ashi, et al.
Published: (2025)

Scalable Controllable Accented TTS
by: Xinyuan, Henry Li, et al.
Published: (2025)

Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization
by: Cai, Zexin, et al.
Published: (2024)

HLTCOE JHU Submission to the Voice Privacy Challenge 2024
by: Xinyuan, Henry Li, et al.
Published: (2024)

Universal Speech Content Factorization
by: Xinyuan, Henry Li, et al.
Published: (2026)

Dimensionality-Aware Anomaly Detection in Learned Representations of Self-Supervised Speech Models
by: Arcos-Holzinger, Sandra, et al.
Published: (2026)