Saved in:
| Main Authors: | Rybakov, Oleg, Serdyuk, Dmitriy, Zheng, Chengjian |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.02887 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
by: Ding, Shaojin, et al.
Published: (2023)
by: Ding, Shaojin, et al.
Published: (2023)
USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models
by: Zhao, Guanlong, et al.
Published: (2023)
by: Zhao, Guanlong, et al.
Published: (2023)
USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
by: Li, Na, et al.
Published: (2025)
by: Li, Na, et al.
Published: (2025)
SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
by: Le, Khanh, et al.
Published: (2025)
by: Le, Khanh, et al.
Published: (2025)
GhostRNN: Reducing State Redundancy in RNN with Cheap Operations
by: Zhou, Hang, et al.
Published: (2024)
by: Zhou, Hang, et al.
Published: (2024)
MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation
by: Zhao, Shengkui, et al.
Published: (2023)
by: Zhao, Shengkui, et al.
Published: (2023)
Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis
by: Wang, Xintong, et al.
Published: (2024)
by: Wang, Xintong, et al.
Published: (2024)
Onset and offset weighted loss function for sound event detection
by: Song, Tao
Published: (2024)
by: Song, Tao
Published: (2024)
Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio
by: Ma, Yi, et al.
Published: (2024)
by: Ma, Yi, et al.
Published: (2024)
SimulTron: On-Device Simultaneous Speech to Speech Translation
by: Agranovich, Alex, et al.
Published: (2024)
by: Agranovich, Alex, et al.
Published: (2024)
BUT Systems and Analyses for the ASVspoof 5 Challenge
by: Rohdin, Johan, et al.
Published: (2024)
by: Rohdin, Johan, et al.
Published: (2024)
Token-Weighted RNN-T for Learning from Flawed Data
by: Keren, Gil, et al.
Published: (2024)
by: Keren, Gil, et al.
Published: (2024)
Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting
by: Mitchell, Daniel A., et al.
Published: (2024)
by: Mitchell, Daniel A., et al.
Published: (2024)
Resnet-conformer network with shared weights and attention mechanism for sound event localization, detection, and distance estimation
by: Vo, Quoc Thinh, et al.
Published: (2025)
by: Vo, Quoc Thinh, et al.
Published: (2025)
DualSep: A Light-weight dual-encoder convolutional recurrent network for real-time in-car speech separation
by: Wang, Ziqian, et al.
Published: (2024)
by: Wang, Ziqian, et al.
Published: (2024)
Attention-weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied to Speech Emotion Recognition
by: Yang, Qingran, et al.
Published: (2026)
by: Yang, Qingran, et al.
Published: (2026)
The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN
by: Yuan, Zheng, et al.
Published: (2023)
by: Yuan, Zheng, et al.
Published: (2023)
STCON System for the CHiME-8 Challenge
by: Mitrofanov, Anton, et al.
Published: (2024)
by: Mitrofanov, Anton, et al.
Published: (2024)
A framework of text-dependent speaker verification for chinese numerical string corpus
by: Zheng, Litong, et al.
Published: (2024)
by: Zheng, Litong, et al.
Published: (2024)
Acoustic Volume Rendering for Neural Impulse Response Fields
by: Lan, Zitong, et al.
Published: (2024)
by: Lan, Zitong, et al.
Published: (2024)
Constraint Optimized Multichannel Mixer-limiter Design
by: Luo, Yuancheng, et al.
Published: (2025)
by: Luo, Yuancheng, et al.
Published: (2025)
Robust fine-tuning of speech recognition models via model merging: application to disordered speech
by: Ducorroy, Alexandre, et al.
Published: (2025)
by: Ducorroy, Alexandre, et al.
Published: (2025)
Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding control
by: Chen, Yu-Hua, et al.
Published: (2024)
by: Chen, Yu-Hua, et al.
Published: (2024)
Multiple Mobile Target Detection and Tracking in Active Sonar Array Using a Track-Before-Detect Approach
by: Abu, Avi, et al.
Published: (2024)
by: Abu, Avi, et al.
Published: (2024)
Contrastive Loss Based Frame-wise Feature disentanglement for Polyphonic Sound Event Detection
by: Guan, Yadong, et al.
Published: (2024)
by: Guan, Yadong, et al.
Published: (2024)
Towards audio language modeling -- an overview
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
How phonemes contribute to deep speaker models?
by: Li, Pengqi, et al.
Published: (2024)
by: Li, Pengqi, et al.
Published: (2024)
Are audio DeepFake detection models polyglots?
by: Marek, Bartłomiej, et al.
Published: (2024)
by: Marek, Bartłomiej, et al.
Published: (2024)
AxLSTMs: learning self-supervised audio representations with xLSTMs
by: Yadav, Sarthak, et al.
Published: (2024)
by: Yadav, Sarthak, et al.
Published: (2024)
Fine-Tuning Automatic Speech Recognition for People with Parkinson's: An Effective Strategy for Enhancing Speech Technology Accessibility
by: Zheng, Xiuwen, et al.
Published: (2024)
by: Zheng, Xiuwen, et al.
Published: (2024)
A Joint Noise Disentanglement and Adversarial Training Framework for Robust Speaker Verification
by: Xing, Xujiang, et al.
Published: (2024)
by: Xing, Xujiang, et al.
Published: (2024)
How Much Does Machine Identity Matter in Anomalous Sound Detection at Test Time?
by: Wilkinghoff, Kevin, et al.
Published: (2026)
by: Wilkinghoff, Kevin, et al.
Published: (2026)
Temporal Pooling Strategies for Training-Free Anomalous Sound Detection with Self-Supervised Audio Embeddings
by: Wilkinghoff, Kevin, et al.
Published: (2026)
by: Wilkinghoff, Kevin, et al.
Published: (2026)
Probing mental health information in speech foundation models
by: de Gennes, Marc, et al.
Published: (2024)
by: de Gennes, Marc, et al.
Published: (2024)
WhisperFlow: speech foundation models in real time
by: Wang, Rongxiang, et al.
Published: (2024)
by: Wang, Rongxiang, et al.
Published: (2024)
SponTTS: modeling and transferring spontaneous style for TTS
by: Li, Hanzhao, et al.
Published: (2023)
by: Li, Hanzhao, et al.
Published: (2023)
asr_eval: Algorithms and tools for multi-reference and streaming speech recognition evaluation
by: Sedukhin, Oleg, et al.
Published: (2026)
by: Sedukhin, Oleg, et al.
Published: (2026)
SuperCodec: A Neural Speech Codec with Selective Back-Projection Network
by: Zheng, Youqiang, et al.
Published: (2024)
by: Zheng, Youqiang, et al.
Published: (2024)
Leveraging LLM for Stuttering Speech: A Unified Architecture Bridging Recognition and Event Detection
by: Huang, Shangkun, et al.
Published: (2025)
by: Huang, Shangkun, et al.
Published: (2025)
Speech Emotion Recognition Using Fine-Tuned DWFormer:A Study on Track 1 of the IERPChallenge 2024
by: Wang, Honghong, et al.
Published: (2025)
by: Wang, Honghong, et al.
Published: (2025)
Similar Items
-
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
by: Ding, Shaojin, et al.
Published: (2023) -
USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models
by: Zhao, Guanlong, et al.
Published: (2023) -
USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
by: Li, Na, et al.
Published: (2025) -
SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
by: Le, Khanh, et al.
Published: (2025) -
GhostRNN: Reducing State Redundancy in RNN with Cheap Operations
by: Zhou, Hang, et al.
Published: (2024)