:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hong, Yerin, Lim, Juhwan, Min, Jinhong, Agarwal, Nishkarsh, Hovden, Robert, Bol, Ageeth A., Li, Yiyang
Format:	Preprint
Published:	2025
Subjects:	Materials Science Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2510.10911
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Controllable Singing Voice Synthesis using Phoneme-Level Energy Sequence
by: Ryu, Yerin, et al.
Published: (2025)

Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
by: Choi, Yerin, et al.
Published: (2024)

Inappropriate Pause Detection In Dysarthric Speech Using Large-Scale Speech Recognition
by: Lee, Jeehyun, et al.
Published: (2024)

Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR
by: Li, Longhao, et al.
Published: (2025)

Differentiable Grouped Feedback Delay Networks for Learning Coupled Volume Acoustics
by: Das, Orchisama, et al.
Published: (2025)

Learning Filters in Feedback Delay Networks from Noisy Room Impulse Responses
by: Santo, Gloria Dal, et al.
Published: (2025)

Matching Reverberant Speech Through Learned Acoustic Embeddings and Feedback Delay Networks
by: Götz, Philipp, et al.
Published: (2025)

Metadata-Enhanced Speech Emotion Recognition: Augmented Residual Integration and Co-Attention in Two-Stage Fine-Tuning
by: Wan, Zixiang, et al.
Published: (2024)

Dereverberation in Acoustic Sensor Networks Using Weighted Prediction Error With Microphone-dependent Prediction Delays
by: Lohmann, Anselm, et al.
Published: (2023)

On Time Delay Interpolation for Improved Acoustic Reflector Localization
by: Rosseel, Hannes, et al.
Published: (2025)

Hierarchical Symbolic Pop Music Generation with Graph Neural Networks
by: Lim, Wen Qing, et al.
Published: (2024)

Performance improvement of spatial semantic segmentation with enriched audio features and agent-based error correction for DCASE 2025 Challenge Task 4
by: Park, Jongyeon, et al.
Published: (2025)

Sound event detection based on auxiliary decoder and maximum probability aggregation for DCASE Challenge 2024 Task 4
by: Son, Sang Won, et al.
Published: (2024)

LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting
by: Zhu, Pai, et al.
Published: (2025)

The Universal Personalizer: Few-Shot Dysarthric Speech Recognition via Meta-Learning
by: Agarwal, Dhruuv, et al.
Published: (2025)

MPIPN: A Multi Physics-Informed PointNet for solving parametric acoustic-structure systems
by: Wang, Chu, et al.
Published: (2024)

Streaming Endpointer for Spoken Dialogue using Neural Audio Codecs and Label-Delayed Training
by: Udupa, Sathvik, et al.
Published: (2025)

Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models
by: Zhao, Yiyang, et al.
Published: (2024)

A Phase Synthesizer for Decorrelation to Improve Acoustic Feedback Cancellation
by: Linhard, Klaus, et al.
Published: (2025)

An Investigation on Combining Geometry and Consistency Constraints into Phase Estimation for Speech Enhancement
by: Ho, Chun-Wei, et al.
Published: (2025)

S2S-Arena: Evaluating Paralinguistic Instruction Following in Speech-to-Speech Models
by: Jiang, Feng, et al.
Published: (2025)

Hybrid Decoding: Rapid Pass and Selective Detailed Correction for Sequence Models
by: Lim, Yunkyu, et al.
Published: (2025)

Room Impulse Response Synthesis via Differentiable Feedback Delay Networks for Efficient Spatial Audio Rendering
by: Gerami, Armin, et al.
Published: (2025)

Phase Aware Ear-Conditioned Learning for Multi-Channel Binaural Speaker Separation
by: Jeremiah, Ruben Johnson Robert, et al.
Published: (2025)

A Distilled Low-Latency Neural Vocoder with Explicit Amplitude and Phase Prediction
by: Du, Hui-Peng, et al.
Published: (2025)

Magnitude and Phase-based Feature Fusion Using Co-attention Mechanism for Speaker recognition
by: Su, Rongfeng, et al.
Published: (2025)

FA-GAN: Artifacts-free and Phase-aware High-fidelity GAN-based Vocoder
by: Shen, Rubing, et al.
Published: (2024)

Binaural Unmasking in Practical Use: Perceived Level of Phase-inverted Speech in Environmental Noise
by: Kotani, Rina, et al.
Published: (2025)

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra
by: Lu, Ye-Xin, et al.
Published: (2023)

QiandaoEar22: A high quality noise dataset for identifying specific ship from multiple underwater acoustic targets using ship-radiated noise
by: Du, Xiaoyang, et al.
Published: (2024)

Data-Driven Room Acoustic Modeling Via Differentiable Feedback Delay Networks With Learnable Delay Lines
by: Mezza, Alessandro Ilic, et al.
Published: (2024)

Token-based Attractors and Cross-attention in Spoof Diarization
by: Koo, Kyo-Won, et al.
Published: (2025)

Chirp Group Delay based Onset Detection in Instruments with Fast Attack
by: Joysingh, S. Johanan, et al.
Published: (2024)

Hybrid Real- And Complex-Valued Neural Network Concept For Low-Complexity Phase-Aware Speech Enhancement
by: Fiorio, Luan Vinícius, et al.
Published: (2025)

StepAudio 2.5 Technical Report
by: Lin, Bin, et al.
Published: (2026)

H-QuEST: Accelerating Query-by-Example Spoken Term Detection with Hierarchical Indexing
by: Singh, Akanksha, et al.
Published: (2025)

ROAR: Reinforcing Original to Augmented Data Ratio Dynamics for Wav2Vec2.0 Based ASR
by: Singh, Vishwanath Pratap, et al.
Published: (2024)

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
by: Huang, Chien-yu, et al.
Published: (2024)

MoE-TTS: Enhancing Out-of-Domain Text Understanding for Description-based TTS via Mixture-of-Experts
by: Xue, Heyang, et al.
Published: (2025)

On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition
by: HU, Shujie, et al.
Published: (2025)