Saved in:
| Main Authors: | Hong, Yerin, Lim, Juhwan, Min, Jinhong, Agarwal, Nishkarsh, Hovden, Robert, Bol, Ageeth A., Li, Yiyang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.10911 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Controllable Singing Voice Synthesis using Phoneme-Level Energy Sequence
by: Ryu, Yerin, et al.
Published: (2025)
by: Ryu, Yerin, et al.
Published: (2025)
Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
by: Choi, Yerin, et al.
Published: (2024)
by: Choi, Yerin, et al.
Published: (2024)
Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition
by: Lee, Jeehyun, et al.
Published: (2024)
by: Lee, Jeehyun, et al.
Published: (2024)
Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR
by: Li, Longhao, et al.
Published: (2025)
by: Li, Longhao, et al.
Published: (2025)
Differentiable Grouped Feedback Delay Networks for Learning Coupled Volume Acoustics
by: Das, Orchisama, et al.
Published: (2025)
by: Das, Orchisama, et al.
Published: (2025)
Learning Filters in Feedback Delay Networks from Noisy Room Impulse Responses
by: Santo, Gloria Dal, et al.
Published: (2025)
by: Santo, Gloria Dal, et al.
Published: (2025)
Matching Reverberant Speech Through Learned Acoustic Embeddings and Feedback Delay Networks
by: Götz, Philipp, et al.
Published: (2025)
by: Götz, Philipp, et al.
Published: (2025)
Metadata-Enhanced Speech Emotion Recognition: Augmented Residual Integration and Co-Attention in Two-Stage Fine-Tuning
by: Wan, Zixiang, et al.
Published: (2024)
by: Wan, Zixiang, et al.
Published: (2024)
Dereverberation in Acoustic Sensor Networks Using Weighted Prediction Error With Microphone-dependent Prediction Delays
by: Lohmann, Anselm, et al.
Published: (2023)
by: Lohmann, Anselm, et al.
Published: (2023)
On Time Delay Interpolation for Improved Acoustic Reflector Localization
by: Rosseel, Hannes, et al.
Published: (2025)
by: Rosseel, Hannes, et al.
Published: (2025)
Hierarchical Symbolic Pop Music Generation with Graph Neural Networks
by: Lim, Wen Qing, et al.
Published: (2024)
by: Lim, Wen Qing, et al.
Published: (2024)
Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 4
by: Park, Jongyeon, et al.
Published: (2025)
by: Park, Jongyeon, et al.
Published: (2025)
Sound event detection based on auxiliary decoder and maximum probability aggregation for DCASE Challenge 2024 Task 4
by: Son, Sang Won, et al.
Published: (2024)
by: Son, Sang Won, et al.
Published: (2024)
LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting
by: Zhu, Pai, et al.
Published: (2025)
by: Zhu, Pai, et al.
Published: (2025)
The Universal Personalizer: Few-Shot Dysarthric Speech Recognition via Meta-Learning
by: Agarwal, Dhruuv, et al.
Published: (2025)
by: Agarwal, Dhruuv, et al.
Published: (2025)
MPIPN: A Multi Physics-Informed PointNet for solving parametric acoustic-structure systems
by: Wang, Chu, et al.
Published: (2024)
by: Wang, Chu, et al.
Published: (2024)
Streaming Endpointer for Spoken Dialogue using Neural Audio Codecs and Label-Delayed Training
by: Udupa, Sathvik, et al.
Published: (2025)
by: Udupa, Sathvik, et al.
Published: (2025)
Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models
by: Zhao, Yiyang, et al.
Published: (2024)
by: Zhao, Yiyang, et al.
Published: (2024)
A Phase Synthesizer for Decorrelation to Improve Acoustic Feedback Cancellation
by: Linhard, Klaus, et al.
Published: (2025)
by: Linhard, Klaus, et al.
Published: (2025)
An Investigation on Combining Geometry and Consistency Constraints into Phase Estimation for Speech Enhancement
by: Ho, Chun-Wei, et al.
Published: (2025)
by: Ho, Chun-Wei, et al.
Published: (2025)
S2S-Arena: Evaluating Paralinguistic Instruction Following in Speech-to-Speech Models
by: Jiang, Feng, et al.
Published: (2025)
by: Jiang, Feng, et al.
Published: (2025)
Hybrid Decoding: Rapid Pass and Selective Detailed Correction for Sequence Models
by: Lim, Yunkyu, et al.
Published: (2025)
by: Lim, Yunkyu, et al.
Published: (2025)
Room Impulse Response Synthesis via Differentiable Feedback Delay Networks for Efficient Spatial Audio Rendering
by: Gerami, Armin, et al.
Published: (2025)
by: Gerami, Armin, et al.
Published: (2025)
Phase Aware Ear-Conditioned Learning for Multi-Channel Binaural Speaker Separation
by: Jeremiah, Ruben Johnson Robert, et al.
Published: (2025)
by: Jeremiah, Ruben Johnson Robert, et al.
Published: (2025)
A Distilled Low-Latency Neural Vocoder with Explicit Amplitude and Phase Prediction
by: Du, Hui-Peng, et al.
Published: (2025)
by: Du, Hui-Peng, et al.
Published: (2025)
Magnitude and Phase-based Feature Fusion Using Co-attention Mechanism for Speaker recognition
by: Su, Rongfeng, et al.
Published: (2025)
by: Su, Rongfeng, et al.
Published: (2025)
FA-GAN: Artifacts-free and Phase-aware High-fidelity GAN-based Vocoder
by: Shen, Rubing, et al.
Published: (2024)
by: Shen, Rubing, et al.
Published: (2024)
Binaural Unmasking in Practical Use: Perceived Level of Phase-inverted Speech in Environmental Noise
by: Kotani, Rina, et al.
Published: (2025)
by: Kotani, Rina, et al.
Published: (2025)
MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra
by: Lu, Ye-Xin, et al.
Published: (2023)
by: Lu, Ye-Xin, et al.
Published: (2023)
QiandaoEar22: A high quality noise dataset for identifying specific ship from multiple underwater acoustic targets using ship-radiated noise
by: Du, Xiaoyang, et al.
Published: (2024)
by: Du, Xiaoyang, et al.
Published: (2024)
Data-Driven Room Acoustic Modeling Via Differentiable Feedback Delay Networks With Learnable Delay Lines
by: Mezza, Alessandro Ilic, et al.
Published: (2024)
by: Mezza, Alessandro Ilic, et al.
Published: (2024)
Token-based Attractors and Cross-attention in Spoof Diarization
by: Koo, Kyo-Won, et al.
Published: (2025)
by: Koo, Kyo-Won, et al.
Published: (2025)
Chirp Group Delay based Onset Detection in Instruments with Fast Attack
by: Joysingh, S. Johanan, et al.
Published: (2024)
by: Joysingh, S. Johanan, et al.
Published: (2024)
Hybrid Real- And Complex-Valued Neural Network Concept For Low-Complexity Phase-Aware Speech Enhancement
by: Fiorio, Luan Vinícius, et al.
Published: (2025)
by: Fiorio, Luan Vinícius, et al.
Published: (2025)
StepAudio 2.5 Technical Report
by: Lin, Bin, et al.
Published: (2026)
by: Lin, Bin, et al.
Published: (2026)
H-QuEST: Accelerating Query-by-Example Spoken Term Detection with Hierarchical Indexing
by: Singh, Akanksha, et al.
Published: (2025)
by: Singh, Akanksha, et al.
Published: (2025)
ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR
by: Singh, Vishwanath Pratap, et al.
Published: (2024)
by: Singh, Vishwanath Pratap, et al.
Published: (2024)
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
by: Huang, Chien-yu, et al.
Published: (2024)
by: Huang, Chien-yu, et al.
Published: (2024)
MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
by: Xue, Heyang, et al.
Published: (2025)
by: Xue, Heyang, et al.
Published: (2025)
On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition
by: HU, Shujie, et al.
Published: (2025)
by: HU, Shujie, et al.
Published: (2025)
Similar Items
-
Controllable Singing Voice Synthesis using Phoneme-Level Energy Sequence
by: Ryu, Yerin, et al.
Published: (2025) -
Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
by: Choi, Yerin, et al.
Published: (2024) -
Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition
by: Lee, Jeehyun, et al.
Published: (2024) -
Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR
by: Li, Longhao, et al.
Published: (2025) -
Differentiable Grouped Feedback Delay Networks for Learning Coupled Volume Acoustics
by: Das, Orchisama, et al.
Published: (2025)