:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bahadi, Soufiyan, Plourde, Eric, Rouat, Jean
Format:	Preprint
Published:	2025
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2502.06989
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Efficient Sparse Coding with the Adaptive Locally Competitive Algorithm for Speech Classification
by: Bahadi, Soufiyan, et al.
Published: (2024)

Dynamic Frequency-Adaptive Knowledge Distillation for Speech Enhancement
by: Yuan, Xihao, et al.
Published: (2025)

Leveraging Local and Global Knowledge Integration with Time-Frequency Calibrated Distillation for Speech Enhancement
by: Cheng, Jiaming, et al.
Published: (2025)

Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection
by: Mariotte, Théo, et al.
Published: (2024)

Frequency-mix Knowledge Distillation for Fake Speech Detection
by: Fan, Cunhang, et al.
Published: (2024)

URGENT-PK: Perceptually-Aligned Ranking Model Designed for Speech Enhancement Competition
by: Wang, Jiahe, et al.
Published: (2025)

NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Shaping
by: Büthe, Jan, et al.
Published: (2023)

Asymmetric Encoder-Decoder Based on Time-Frequency Correlation for Speech Separation
by: Shin, Ui-Hyeop, et al.
Published: (2026)

FRCRN: Boosting Feature Representation using Frequency Recurrence for Monaural Speech Enhancement
by: Zhao, Shengkui, et al.
Published: (2022)

Temporal-Frequency State Space Duality: An Efficient Paradigm for Speech Emotion Recognition
by: Zhao, Jiaqi, et al.
Published: (2024)

Algorithms of Sampling-Frequency-Independent Layers for Non-integer Strides
by: Imamura, Kanami, et al.
Published: (2023)

Adaptive Convolution for CNN-based Speech Enhancement Models
by: Wang, Dahan, et al.
Published: (2025)

Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
by: Ahmad, Hawraz A., et al.
Published: (2024)

A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms
by: Monir, Nasser-Eddine, et al.
Published: (2024)

Adaptive Speech Emotion Representation Learning Based On Dynamic Graph
by: Gao, Yingxue, et al.
Published: (2024)

Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection
by: Nam, Hyeonuk, et al.
Published: (2024)

SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms
by: Li, Sirui, et al.
Published: (2025)

Evaluating Multichannel Speech Enhancement Algorithms at the Phoneme Scale Across Genders
by: Monir, Nasser-Eddine, et al.
Published: (2025)

TF-Mamba: A Time-Frequency Network for Sound Source Localization
by: Xiao, Yang, et al.
Published: (2024)

Combined Generative and Predictive Modeling for Speech Super-resolution
by: Wang, Heming, et al.
Published: (2024)

NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
by: Park, Nohil, et al.
Published: (2024)

Robust Localization of Partially Fake Speech: Metrics and Out-of-Domain Evaluation
by: Luong, Hieu-Thi, et al.
Published: (2025)

LORT: Locally Refined Convolution and Taylor Transformer for Monaural Speech Enhancement
by: Wang, Junyu, et al.
Published: (2025)

TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement
by: Saijo, Kohei, et al.
Published: (2024)

Can LLMs Help Localize Fake Words in Partially Fake Speech?
by: Zhang, Lin, et al.
Published: (2026)

DroFiT: A Lightweight Band-fused Frequency Attention Toward Real-time UAV Speech Enhancement
by: Lee, Jeongmin, et al.
Published: (2025)

Adaptive Data Augmentation with NaturalSpeech3 for Far-field Speaker Verification
by: Zhang, Li, et al.
Published: (2025)

RADE: A Neural Codec for Transmitting Speech over HF Radio Channels
by: Rowe, David, et al.
Published: (2025)

PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models
by: Yang, Runyan, et al.
Published: (2024)

Exploring Local Interpretable Model-Agnostic Explanations for Speech Emotion Recognition with Distribution-Shift
by: Hjuler, Maja J., et al.
Published: (2025)

Local Equivariance Error-Based Metrics for Evaluating Sampling-Frequency-Independent Property of Neural Network
by: Imamura, Kanami, et al.
Published: (2025)

SEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with Local and Global Contexts Aggregation
by: Huang, Ziling, et al.
Published: (2025)

SEMamba++: A General Speech Restoration Framework Leveraging Global, Local, and Periodic Spectral Patterns
by: Lee, Yongjoon, et al.
Published: (2026)

IKFST: IOO and KOO Algorithms for Accelerated and Precise WFST-based End-to-End Automatic Speech Recognition
by: Zhuang, Zhuoran, et al.
Published: (2026)

Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection
by: Fan, Cunhang, et al.
Published: (2023)

RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization
by: Yang, Bing, et al.
Published: (2024)

MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
by: Yang, Qian, et al.
Published: (2024)

Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding
by: Zheng, Rui-Chen, et al.
Published: (2025)

VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via Autoguidance
by: Yeom, Jiheum, et al.
Published: (2024)

GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech
by: Wang, Wenbin, et al.
Published: (2024)