:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Yikang, Wang, Xingming, Nishizaki, Hiromitsu, Li, Ming
Format:	Preprint
Published:	2024
Subjects:	Sound Audio and Speech Processing Signal Processing
Online Access:	https://arxiv.org/abs/2407.20111
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CompSpoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-spoofing Countermeasures
by: Zhang, Xueping, et al.
Published: (2025)

Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition
by: Wang, Kuan-Chen, et al.
Published: (2024)

Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
by: Gao, Xiaoxue, et al.
Published: (2024)

Differentiable Acoustic Radiance Transfer
by: Lee, Sungho, et al.
Published: (2025)

Joint Fullband-Subband Modeling for High-Resolution SingFake Detection
by: Chen, Xuanjun, et al.
Published: (2026)

Amplifying Artifacts with Speech Enhancement in Voice Anti-spoofing
by: Trachu, Thanapat, et al.
Published: (2025)

Adaptive Per-Channel Energy Normalization Front-end for Robust Audio Signal Processing
by: Meng, Hanyu, et al.
Published: (2025)

Generative Deep Learning and Signal Processing for Data Augmentation of Cardiac Auscultation Signals: Improving Model Robustness Using Synthetic Audio
by: Abbott, Leigh, et al.
Published: (2024)

Time-of-arrival Estimation and Phase Unwrapping of Head-related Transfer Functions With Integer Linear Programming
by: Yu, Chin-Yun, et al.
Published: (2024)

Completing Sets of Prototype Transfer Functions for Subspace-based Direction of Arrival Estimation of Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2025)

Lessons Learned from the URGENT 2024 Speech Enhancement Challenge
by: Zhang, Wangyou, et al.
Published: (2025)

Adaptive Diagonal Loading using Krylov Subspaces for Robust Beamforming
by: Mittal, Manan, et al.
Published: (2026)

Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
by: Yuan, Ze, et al.
Published: (2024)

Align-ULCNet: Towards Low-Complexity and Robust Acoustic Echo and Noise Reduction
by: Shetu, Shrishti Saha, et al.
Published: (2024)

Confidence-Based Self-Training for EMG-to-Speech: Leveraging Synthetic EMG for Robust Modeling
by: Chen, Xiaodan, et al.
Published: (2025)

A Robust Method for Pitch Tracking in the Frequency Following Response using Harmonic Amplitude Summation Filterbank
by: Sadeghkhani, Sajad, et al.
Published: (2025)

Aliasing-Free Neural Audio Synthesis
by: Gu, Yicheng, et al.
Published: (2025)

A Machine Hearing System for Robust Cough Detection Based on a High-Level Representation of Band-Specific Audio Features
by: Monge-Alvarez, Jesús, et al.
Published: (2024)

RawTFNet: A Lightweight CNN Architecture for Speech Anti-spoofing
by: Xiao, Yang, et al.
Published: (2025)

SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional Masked Language Modeling
by: Yao, Shengshi, et al.
Published: (2025)

Directional Selective Fixed-Filter Active Noise Control Based on a Convolutional Neural Network in Reverberant Environments
by: Wang, Boxiang, et al.
Published: (2026)

Cross-Talk Reduction
by: Wang, Zhong-Qiu, et al.
Published: (2024)

U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding
by: Wang, Ziqian, et al.
Published: (2025)

Learning Perceptually Relevant Temporal Envelope Morphing
by: Dixit, Satvik, et al.
Published: (2025)

BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM
by: Gong, Xun, et al.
Published: (2025)

Microphone Array Signal Processing and Deep Learning for Speech Enhancement
by: Haeb-Umbach, Reinhold, et al.
Published: (2025)

Machine Learning in Acoustics: A Review and Open-Source Repository
by: McCarthy, Ryan A., et al.
Published: (2025)

Toward Universal Speech Enhancement for Diverse Input Conditions
by: Zhang, Wangyou, et al.
Published: (2023)

AADNet: An End-to-End Deep Learning Model for Auditory Attention Decoding
by: Nguyen, Nhan Duc Thanh, et al.
Published: (2024)

LocaGen: Sub-Sample Time-Delay Learning for Beam Localization
by: Kunwar, Ishaan, et al.
Published: (2025)

Towards Realistic Emotional Voice Conversion using Controllable Emotional Intensity
by: Qi, Tianhua, et al.
Published: (2024)

Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning
by: Shi, Runwu, et al.
Published: (2024)

Blind Source Separation of Radar Signals in Time Domain Using Deep Learning
by: Hinderer, Sven
Published: (2025)

Optimal Scalogram for Computational Complexity Reduction in Acoustic Recognition Using Deep Learning
by: Phan, Dang Thoai, et al.
Published: (2025)

Soundscape Captioning using Sound Affective Quality Network and Large Language Model
by: Hou, Yuanbo, et al.
Published: (2024)

30+ Years of Source Separation Research: Achievements and Future Challenges
by: Araki, Shoko, et al.
Published: (2025)

PromptEVC: Controllable Emotional Voice Conversion with Natural Language Prompts
by: Qi, Tianhua, et al.
Published: (2025)

A Study on Speech Assessment with Visual Cues
by: Ahmed, Shafique, et al.
Published: (2025)

Can Emotion Fool Anti-spoofing?
by: Mahapatra, Aurosweta, et al.
Published: (2025)

Detecting Post-Stroke Aphasia Via Brain Responses to Speech in a Deep Learning Framework
by: De Clercq, Pieter, et al.
Published: (2024)