Saved in:
| Main Authors: | Wang, Yikang, Wang, Xingming, Nishizaki, Hiromitsu, Li, Ming |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.20111 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CompSpoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-spoofing Countermeasures
by: Zhang, Xueping, et al.
Published: (2025)
by: Zhang, Xueping, et al.
Published: (2025)
Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition
by: Wang, Kuan-Chen, et al.
Published: (2024)
by: Wang, Kuan-Chen, et al.
Published: (2024)
Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
by: Gao, Xiaoxue, et al.
Published: (2024)
by: Gao, Xiaoxue, et al.
Published: (2024)
Differentiable Acoustic Radiance Transfer
by: Lee, Sungho, et al.
Published: (2025)
by: Lee, Sungho, et al.
Published: (2025)
Joint Fullband-Subband Modeling for High-Resolution SingFake Detection
by: Chen, Xuanjun, et al.
Published: (2026)
by: Chen, Xuanjun, et al.
Published: (2026)
Amplifying Artifacts with Speech Enhancement in Voice Anti-spoofing
by: Trachu, Thanapat, et al.
Published: (2025)
by: Trachu, Thanapat, et al.
Published: (2025)
Adaptive Per-Channel Energy Normalization Front-end for Robust Audio Signal Processing
by: Meng, Hanyu, et al.
Published: (2025)
by: Meng, Hanyu, et al.
Published: (2025)
Generative Deep Learning and Signal Processing for Data Augmentation of Cardiac Auscultation Signals: Improving Model Robustness Using Synthetic Audio
by: Abbott, Leigh, et al.
Published: (2024)
by: Abbott, Leigh, et al.
Published: (2024)
Time-of-arrival Estimation and Phase Unwrapping of Head-related Transfer Functions With Integer Linear Programming
by: Yu, Chin-Yun, et al.
Published: (2024)
by: Yu, Chin-Yun, et al.
Published: (2024)
Completing Sets of Prototype Transfer Functions for Subspace-based Direction of Arrival Estimation of Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2025)
by: Fejgin, Daniel, et al.
Published: (2025)
Lessons Learned from the URGENT 2024 Speech Enhancement Challenge
by: Zhang, Wangyou, et al.
Published: (2025)
by: Zhang, Wangyou, et al.
Published: (2025)
Adaptive Diagonal Loading using Krylov Subspaces for Robust Beamforming
by: Mittal, Manan, et al.
Published: (2026)
by: Mittal, Manan, et al.
Published: (2026)
Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
by: Yuan, Ze, et al.
Published: (2024)
by: Yuan, Ze, et al.
Published: (2024)
Align-ULCNet: Towards Low-Complexity and Robust Acoustic Echo and Noise Reduction
by: Shetu, Shrishti Saha, et al.
Published: (2024)
by: Shetu, Shrishti Saha, et al.
Published: (2024)
Confidence-Based Self-Training for EMG-to-Speech: Leveraging Synthetic EMG for Robust Modeling
by: Chen, Xiaodan, et al.
Published: (2025)
by: Chen, Xiaodan, et al.
Published: (2025)
A Robust Method for Pitch Tracking in the Frequency Following Response using Harmonic Amplitude Summation Filterbank
by: Sadeghkhani, Sajad, et al.
Published: (2025)
by: Sadeghkhani, Sajad, et al.
Published: (2025)
Aliasing-Free Neural Audio Synthesis
by: Gu, Yicheng, et al.
Published: (2025)
by: Gu, Yicheng, et al.
Published: (2025)
A Machine Hearing System for Robust Cough Detection Based on a High-Level Representation of Band-Specific Audio Features
by: Monge-Alvarez, Jesús, et al.
Published: (2024)
by: Monge-Alvarez, Jesús, et al.
Published: (2024)
RawTFNet: A Lightweight CNN Architecture for Speech Anti-spoofing
by: Xiao, Yang, et al.
Published: (2025)
by: Xiao, Yang, et al.
Published: (2025)
SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional Masked Language Modeling
by: Yao, Shengshi, et al.
Published: (2025)
by: Yao, Shengshi, et al.
Published: (2025)
Directional Selective Fixed-Filter Active Noise Control Based on a Convolutional Neural Network in Reverberant Environments
by: Wang, Boxiang, et al.
Published: (2026)
by: Wang, Boxiang, et al.
Published: (2026)
Cross-Talk Reduction
by: Wang, Zhong-Qiu, et al.
Published: (2024)
by: Wang, Zhong-Qiu, et al.
Published: (2024)
U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding
by: Wang, Ziqian, et al.
Published: (2025)
by: Wang, Ziqian, et al.
Published: (2025)
Learning Perceptually Relevant Temporal Envelope Morphing
by: Dixit, Satvik, et al.
Published: (2025)
by: Dixit, Satvik, et al.
Published: (2025)
BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM
by: Gong, Xun, et al.
Published: (2025)
by: Gong, Xun, et al.
Published: (2025)
Microphone Array Signal Processing and Deep Learning for Speech Enhancement
by: Haeb-Umbach, Reinhold, et al.
Published: (2025)
by: Haeb-Umbach, Reinhold, et al.
Published: (2025)
Machine Learning in Acoustics: A Review and Open-Source Repository
by: McCarthy, Ryan A., et al.
Published: (2025)
by: McCarthy, Ryan A., et al.
Published: (2025)
Toward Universal Speech Enhancement for Diverse Input Conditions
by: Zhang, Wangyou, et al.
Published: (2023)
by: Zhang, Wangyou, et al.
Published: (2023)
AADNet: An End-to-End Deep Learning Model for Auditory Attention Decoding
by: Nguyen, Nhan Duc Thanh, et al.
Published: (2024)
by: Nguyen, Nhan Duc Thanh, et al.
Published: (2024)
LocaGen: Sub-Sample Time-Delay Learning for Beam Localization
by: Kunwar, Ishaan, et al.
Published: (2025)
by: Kunwar, Ishaan, et al.
Published: (2025)
Towards Realistic Emotional Voice Conversion using Controllable Emotional Intensity
by: Qi, Tianhua, et al.
Published: (2024)
by: Qi, Tianhua, et al.
Published: (2024)
Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning
by: Shi, Runwu, et al.
Published: (2024)
by: Shi, Runwu, et al.
Published: (2024)
Blind Source Separation of Radar Signals in Time Domain Using Deep Learning
by: Hinderer, Sven
Published: (2025)
by: Hinderer, Sven
Published: (2025)
Optimal Scalogram for Computational Complexity Reduction in Acoustic Recognition Using Deep Learning
by: Phan, Dang Thoai, et al.
Published: (2025)
by: Phan, Dang Thoai, et al.
Published: (2025)
Soundscape Captioning using Sound Affective Quality Network and Large Language Model
by: Hou, Yuanbo, et al.
Published: (2024)
by: Hou, Yuanbo, et al.
Published: (2024)
30+ Years of Source Separation Research: Achievements and Future Challenges
by: Araki, Shoko, et al.
Published: (2025)
by: Araki, Shoko, et al.
Published: (2025)
PromptEVC: Controllable Emotional Voice Conversion with Natural Language Prompts
by: Qi, Tianhua, et al.
Published: (2025)
by: Qi, Tianhua, et al.
Published: (2025)
A Study on Speech Assessment with Visual Cues
by: Ahmed, Shafique, et al.
Published: (2025)
by: Ahmed, Shafique, et al.
Published: (2025)
Can Emotion Fool Anti-spoofing?
by: Mahapatra, Aurosweta, et al.
Published: (2025)
by: Mahapatra, Aurosweta, et al.
Published: (2025)
Detecting Post-Stroke Aphasia Via Brain Responses to Speech in a Deep Learning Framework
by: De Clercq, Pieter, et al.
Published: (2024)
by: De Clercq, Pieter, et al.
Published: (2024)
Similar Items
-
CompSpoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-spoofing Countermeasures
by: Zhang, Xueping, et al.
Published: (2025) -
Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition
by: Wang, Kuan-Chen, et al.
Published: (2024) -
Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
by: Gao, Xiaoxue, et al.
Published: (2024) -
Differentiable Acoustic Radiance Transfer
by: Lee, Sungho, et al.
Published: (2025) -
Joint Fullband-Subband Modeling for High-Resolution SingFake Detection
by: Chen, Xuanjun, et al.
Published: (2026)