:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ratnarajah, Anton, Ergezer, Mehmet, Nair, Arun, Athi, Mrudula
Format:	Preprint
Published:	2026
Subjects:	Sound Artificial Intelligence Audio and Speech Processing Signal Processing
Online Access:	https://arxiv.org/abs/2605.00721
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Dependence on Early and Late Reverberation of Single-Channel Speaker Distance Estimation
by: Neri, Michael, et al.
Published: (2026)

Towards Low-Latency Tracking of Multiple Speakers With Short-Context Speaker Embeddings
by: Iatariene, Taous, et al.
Published: (2025)

CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
by: Kim, Ji-Hoon, et al.
Published: (2024)

Mind the Prompt: Prompting Strategies in Audio Generations for Improving Sound Classification
by: Ronchini, Francesca, et al.
Published: (2025)

Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling
by: Masuyama, Yoshiki, et al.
Published: (2026)

BRUDEX Database: Binaural Room Impulse Responses with Uniformly Distributed External Microphones
by: Fejgin, Daniel, et al.
Published: (2023)

Perceptual Noise-Masking with Music through Deep Spectral Envelope Shaping
by: Berger, Clémentine, et al.
Published: (2025)

Acoustivision Pro: An Open-Source Interactive Platform for Room Impulse Response Analysis and Acoustic Characterization
by: Goswami, Mandip
Published: (2026)

Toward Fully-End-to-End Listened Speech Decoding from EEG Signals
by: Lee, Jihwan, et al.
Published: (2024)

Joint Semantic Knowledge Distillation and Masked Acoustic Modeling for Full-band Speech Restoration with Improved Intelligibility
by: Liu, Xiaoyu, et al.
Published: (2024)

AI-Generated Music Detection in Broadcast Monitoring
by: López-Ayala, David, et al.
Published: (2026)

Robust Generative Audio Quality Assessment: Disentangling Quality from Spurious Correlations
by: Huang, Kuan-Tang, et al.
Published: (2026)

IS${}^3$ : Generic Impulsive--Stationary Sound Separation in Acoustic Scenes using Deep Filtering
by: Berger, Clémentine, et al.
Published: (2025)

Dynamic Multi-Species Bird Soundscape Generation with Acoustic Patterning and 3D Spatialization
by: Zhang, Ellie L., et al.
Published: (2025)

CSL-L2M: Controllable Song-Level Lyric-to-Melody Generation Based on Conditional Transformer with Fine-Grained Lyric and Musical Controls
by: Chai, Li, et al.
Published: (2024)

Comparison of Frequency-Fusion Mechanisms for Binaural Direction-of-Arrival Estimation for Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2024)

Improving snore detection under limited dataset through harmonic/percussive source separation and convolutional neural networks
by: Gonzalez-Martinez, F. D., et al.
Published: (2024)

Exploiting an External Microphone for Binaural RTF-Vector-Based Direction of Arrival Estimation for Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2023)

Completing Sets of Prototype Transfer Functions for Subspace-based Direction of Arrival Estimation of Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2025)

Compressing Quaternion Convolutional Neural Networks for Audio Classification
by: Singh, Arshdeep, et al.
Published: (2025)

Construction and Evaluation of Mandarin Multimodal Emotional Speech Database
by: Ting, Zhu, et al.
Published: (2024)

Neural Speech and Audio Coding: Modern AI Technology Meets Traditional Codecs
by: Kim, Minje, et al.
Published: (2024)

Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation
by: Lee, Jin Woo, et al.
Published: (2024)

Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey
by: Kheddar, Hamza, et al.
Published: (2024)

U-DREAM: Unsupervised Dereverberation guided by a Reverberation Model
by: Bahrman, Louis, et al.
Published: (2025)

Speech Enhancement Based on Drifting Models
by: Xu, Liang, et al.
Published: (2026)

A Hybrid Model for Weakly-Supervised Speech Dereverberation
by: Bahrman, Louis, et al.
Published: (2025)

SWIM: Short-Window CNN Integrated with Mamba for EEG-Based Auditory Spatial Attention Decoding
by: Zhang, Ziyang, et al.
Published: (2024)

Wavetable Synthesis Using CVAE for Timbre Control Based on Semantic Label
by: Yutani, Tsugumasa, et al.
Published: (2024)

Speech Boosting: Low-Latency Live Speech Enhancement for TWS Earbuds
by: Bae, Hanbin, et al.
Published: (2024)

Single-stage TTS with Masked Audio Token Modeling and Semantic Knowledge Distillation
by: Gállego, Gerard I., et al.
Published: (2024)

Tool Wear Prediction in CNC Turning Operations using Ultrasonic Microphone Arrays and CNNs
by: Steckel, Jan, et al.
Published: (2024)

Classification of Heart Sounds Using Multi-Branch Deep Convolutional Network and LSTM-CNN
by: Latifi, Seyed Amir, et al.
Published: (2024)

VoicePrompter: Robust Zero-Shot Voice Conversion with Voice Prompt and Conditional Flow Matching
by: Choi, Ha-Yeong, et al.
Published: (2025)

A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention Mechanism for Symbolic Music Modeling
by: Guo, Z., et al.
Published: (2022)

Resounding Acoustic Fields with Reciprocity
by: Lan, Zitong, et al.
Published: (2025)

Discriminating real and synthetic super-resolved audio samples using embedding-based classifiers
by: Silaev, Mikhail, et al.
Published: (2026)

UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching
by: Choi, Woongjib, et al.
Published: (2025)

JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis
by: Cho, Hyunjae, et al.
Published: (2024)

Acoustic Imaging for UAV Detection: Dense Beamformed Energy Maps and U-Net SELD
by: Rodriguez, Belman Jahir, et al.
Published: (2025)