:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Rybakov, Oleg, Serdyuk, Dmitriy, Zheng, Chengjian
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Sound
Online Access:	https://arxiv.org/abs/2406.02887
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
by: Ding, Shaojin, et al.
Published: (2023)

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models
by: Zhao, Guanlong, et al.
Published: (2023)

USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
by: Li, Na, et al.
Published: (2025)

SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
by: Le, Khanh, et al.
Published: (2025)

GhostRNN: Reducing State Redundancy in RNN with Cheap Operations
by: Zhou, Hang, et al.
Published: (2024)

MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation
by: Zhao, Shengkui, et al.
Published: (2023)

Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis
by: Wang, Xintong, et al.
Published: (2024)

Onset and offset weighted loss function for sound event detection
by: Song, Tao
Published: (2024)

Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio
by: Ma, Yi, et al.
Published: (2024)

SimulTron: On-Device Simultaneous Speech to Speech Translation
by: Agranovich, Alex, et al.
Published: (2024)

BUT Systems and Analyses for the ASVspoof 5 Challenge
by: Rohdin, Johan, et al.
Published: (2024)

Token-Weighted RNN-T for Learning from Flawed Data
by: Keren, Gil, et al.
Published: (2024)

Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting
by: Mitchell, Daniel A., et al.
Published: (2024)

Resnet-conformer network with shared weights and attention mechanism for sound event localization, detection, and distance estimation
by: Vo, Quoc Thinh, et al.
Published: (2025)

DualSep: A Light-weight dual-encoder convolutional recurrent network for real-time in-car speech separation
by: Wang, Ziqian, et al.
Published: (2024)

Attention-weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied to Speech Emotion Recognition
by: Yang, Qingran, et al.
Published: (2026)

The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN
by: Yuan, Zheng, et al.
Published: (2023)

STCON System for the CHiME-8 Challenge
by: Mitrofanov, Anton, et al.
Published: (2024)

A framework of text-dependent speaker verification for chinese numerical string corpus
by: Zheng, Litong, et al.
Published: (2024)

Acoustic Volume Rendering for Neural Impulse Response Fields
by: Lan, Zitong, et al.
Published: (2024)

Constraint Optimized Multichannel Mixer-limiter Design
by: Luo, Yuancheng, et al.
Published: (2025)

Robust fine-tuning of speech recognition models via model merging: application to disordered speech
by: Ducorroy, Alexandre, et al.
Published: (2025)

Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding control
by: Chen, Yu-Hua, et al.
Published: (2024)

Multiple Mobile Target Detection and Tracking in Active Sonar Array Using a Track-Before-Detect Approach
by: Abu, Avi, et al.
Published: (2024)

Contrastive Loss Based Frame-wise Feature disentanglement for Polyphonic Sound Event Detection
by: Guan, Yadong, et al.
Published: (2024)

Towards audio language modeling -- an overview
by: Wu, Haibin, et al.
Published: (2024)

How phonemes contribute to deep speaker models?
by: Li, Pengqi, et al.
Published: (2024)

Are audio DeepFake detection models polyglots?
by: Marek, Bartłomiej, et al.
Published: (2024)

AxLSTMs: learning self-supervised audio representations with xLSTMs
by: Yadav, Sarthak, et al.
Published: (2024)

Fine-Tuning Automatic Speech Recognition for People with Parkinson's: An Effective Strategy for Enhancing Speech Technology Accessibility
by: Zheng, Xiuwen, et al.
Published: (2024)

A Joint Noise Disentanglement and Adversarial Training Framework for Robust Speaker Verification
by: Xing, Xujiang, et al.
Published: (2024)

How Much Does Machine Identity Matter in Anomalous Sound Detection at Test Time?
by: Wilkinghoff, Kevin, et al.
Published: (2026)

Temporal Pooling Strategies for Training-Free Anomalous Sound Detection with Self-Supervised Audio Embeddings
by: Wilkinghoff, Kevin, et al.
Published: (2026)

Probing mental health information in speech foundation models
by: de Gennes, Marc, et al.
Published: (2024)

WhisperFlow: speech foundation models in real time
by: Wang, Rongxiang, et al.
Published: (2024)

SponTTS: modeling and transferring spontaneous style for TTS
by: Li, Hanzhao, et al.
Published: (2023)

asr_eval: Algorithms and tools for multi-reference and streaming speech recognition evaluation
by: Sedukhin, Oleg, et al.
Published: (2026)

SuperCodec: A Neural Speech Codec with Selective Back-Projection Network
by: Zheng, Youqiang, et al.
Published: (2024)

Leveraging LLM for Stuttering Speech: A Unified Architecture Bridging Recognition and Event Detection
by: Huang, Shangkun, et al.
Published: (2025)

Speech Emotion Recognition Using Fine-Tuned DWFormer:A Study on Track 1 of the IERPChallenge 2024
by: Wang, Honghong, et al.
Published: (2025)