Saved in:
| Main Authors: | Ratnarajah, Anton, Ergezer, Mehmet, Nair, Arun, Athi, Mrudula |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.00721 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dependence on Early and Late Reverberation of Single-Channel Speaker Distance Estimation
by: Neri, Michael, et al.
Published: (2026)
by: Neri, Michael, et al.
Published: (2026)
Towards Low-Latency Tracking of Multiple Speakers With Short-Context Speaker Embeddings
by: Iatariene, Taous, et al.
Published: (2025)
by: Iatariene, Taous, et al.
Published: (2025)
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
by: Kim, Ji-Hoon, et al.
Published: (2024)
by: Kim, Ji-Hoon, et al.
Published: (2024)
Mind the Prompt: Prompting Strategies in Audio Generations for Improving Sound Classification
by: Ronchini, Francesca, et al.
Published: (2025)
by: Ronchini, Francesca, et al.
Published: (2025)
Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling
by: Masuyama, Yoshiki, et al.
Published: (2026)
by: Masuyama, Yoshiki, et al.
Published: (2026)
BRUDEX Database: Binaural Room Impulse Responses with Uniformly Distributed External Microphones
by: Fejgin, Daniel, et al.
Published: (2023)
by: Fejgin, Daniel, et al.
Published: (2023)
Perceptual Noise-Masking with Music through Deep Spectral Envelope Shaping
by: Berger, Clémentine, et al.
Published: (2025)
by: Berger, Clémentine, et al.
Published: (2025)
Acoustivision Pro: An Open-Source Interactive Platform for Room Impulse Response Analysis and Acoustic Characterization
by: Goswami, Mandip
Published: (2026)
by: Goswami, Mandip
Published: (2026)
Toward Fully-End-to-End Listened Speech Decoding from EEG Signals
by: Lee, Jihwan, et al.
Published: (2024)
by: Lee, Jihwan, et al.
Published: (2024)
Joint Semantic Knowledge Distillation and Masked Acoustic Modeling for Full-band Speech Restoration with Improved Intelligibility
by: Liu, Xiaoyu, et al.
Published: (2024)
by: Liu, Xiaoyu, et al.
Published: (2024)
AI-Generated Music Detection in Broadcast Monitoring
by: López-Ayala, David, et al.
Published: (2026)
by: López-Ayala, David, et al.
Published: (2026)
Robust Generative Audio Quality Assessment: Disentangling Quality from Spurious Correlations
by: Huang, Kuan-Tang, et al.
Published: (2026)
by: Huang, Kuan-Tang, et al.
Published: (2026)
IS${}^3$ : Generic Impulsive--Stationary Sound Separation in Acoustic Scenes using Deep Filtering
by: Berger, Clémentine, et al.
Published: (2025)
by: Berger, Clémentine, et al.
Published: (2025)
Dynamic Multi-Species Bird Soundscape Generation with Acoustic Patterning and 3D Spatialization
by: Zhang, Ellie L., et al.
Published: (2025)
by: Zhang, Ellie L., et al.
Published: (2025)
CSL-L2M: Controllable Song-Level Lyric-to-Melody Generation Based on Conditional Transformer with Fine-Grained Lyric and Musical Controls
by: Chai, Li, et al.
Published: (2024)
by: Chai, Li, et al.
Published: (2024)
Comparison of Frequency-Fusion Mechanisms for Binaural Direction-of-Arrival Estimation for Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2024)
by: Fejgin, Daniel, et al.
Published: (2024)
Improving snore detection under limited dataset through harmonic/percussive source separation and convolutional neural networks
by: Gonzalez-Martinez, F. D., et al.
Published: (2024)
by: Gonzalez-Martinez, F. D., et al.
Published: (2024)
Exploiting an External Microphone for Binaural RTF-Vector-Based Direction of Arrival Estimation for Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2023)
by: Fejgin, Daniel, et al.
Published: (2023)
Completing Sets of Prototype Transfer Functions for Subspace-based Direction of Arrival Estimation of Multiple Speakers
by: Fejgin, Daniel, et al.
Published: (2025)
by: Fejgin, Daniel, et al.
Published: (2025)
Compressing Quaternion Convolutional Neural Networks for Audio Classification
by: Singh, Arshdeep, et al.
Published: (2025)
by: Singh, Arshdeep, et al.
Published: (2025)
Construction and Evaluation of Mandarin Multimodal Emotional Speech Database
by: Ting, Zhu, et al.
Published: (2024)
by: Ting, Zhu, et al.
Published: (2024)
Neural Speech and Audio Coding: Modern AI Technology Meets Traditional Codecs
by: Kim, Minje, et al.
Published: (2024)
by: Kim, Minje, et al.
Published: (2024)
Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation
by: Lee, Jin Woo, et al.
Published: (2024)
by: Lee, Jin Woo, et al.
Published: (2024)
Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey
by: Kheddar, Hamza, et al.
Published: (2024)
by: Kheddar, Hamza, et al.
Published: (2024)
U-DREAM: Unsupervised Dereverberation guided by a Reverberation Model
by: Bahrman, Louis, et al.
Published: (2025)
by: Bahrman, Louis, et al.
Published: (2025)
Speech Enhancement Based on Drifting Models
by: Xu, Liang, et al.
Published: (2026)
by: Xu, Liang, et al.
Published: (2026)
A Hybrid Model for Weakly-Supervised Speech Dereverberation
by: Bahrman, Louis, et al.
Published: (2025)
by: Bahrman, Louis, et al.
Published: (2025)
SWIM: Short-Window CNN Integrated with Mamba for EEG-Based Auditory Spatial Attention Decoding
by: Zhang, Ziyang, et al.
Published: (2024)
by: Zhang, Ziyang, et al.
Published: (2024)
Wavetable Synthesis Using CVAE for Timbre Control Based on Semantic Label
by: Yutani, Tsugumasa, et al.
Published: (2024)
by: Yutani, Tsugumasa, et al.
Published: (2024)
Speech Boosting: Low-Latency Live Speech Enhancement for TWS Earbuds
by: Bae, Hanbin, et al.
Published: (2024)
by: Bae, Hanbin, et al.
Published: (2024)
Single-stage TTS with Masked Audio Token Modeling and Semantic Knowledge Distillation
by: Gállego, Gerard I., et al.
Published: (2024)
by: Gállego, Gerard I., et al.
Published: (2024)
Tool Wear Prediction in CNC Turning Operations using Ultrasonic Microphone Arrays and CNNs
by: Steckel, Jan, et al.
Published: (2024)
by: Steckel, Jan, et al.
Published: (2024)
Classification of Heart Sounds Using Multi-Branch Deep Convolutional Network and LSTM-CNN
by: Latifi, Seyed Amir, et al.
Published: (2024)
by: Latifi, Seyed Amir, et al.
Published: (2024)
VoicePrompter: Robust Zero-Shot Voice Conversion with Voice Prompt and Conditional Flow Matching
by: Choi, Ha-Yeong, et al.
Published: (2025)
by: Choi, Ha-Yeong, et al.
Published: (2025)
A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention Mechanism for Symbolic Music Modeling
by: Guo, Z., et al.
Published: (2022)
by: Guo, Z., et al.
Published: (2022)
Resounding Acoustic Fields with Reciprocity
by: Lan, Zitong, et al.
Published: (2025)
by: Lan, Zitong, et al.
Published: (2025)
Discriminating real and synthetic super-resolved audio samples using embedding-based classifiers
by: Silaev, Mikhail, et al.
Published: (2026)
by: Silaev, Mikhail, et al.
Published: (2026)
UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching
by: Choi, Woongjib, et al.
Published: (2025)
by: Choi, Woongjib, et al.
Published: (2025)
JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis
by: Cho, Hyunjae, et al.
Published: (2024)
by: Cho, Hyunjae, et al.
Published: (2024)
Acoustic Imaging for UAV Detection: Dense Beamformed Energy Maps and U-Net SELD
by: Rodriguez, Belman Jahir, et al.
Published: (2025)
by: Rodriguez, Belman Jahir, et al.
Published: (2025)
Similar Items
-
Dependence on Early and Late Reverberation of Single-Channel Speaker Distance Estimation
by: Neri, Michael, et al.
Published: (2026) -
Towards Low-Latency Tracking of Multiple Speakers With Short-Context Speaker Embeddings
by: Iatariene, Taous, et al.
Published: (2025) -
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
by: Kim, Ji-Hoon, et al.
Published: (2024) -
Mind the Prompt: Prompting Strategies in Audio Generations for Improving Sound Classification
by: Ronchini, Francesca, et al.
Published: (2025) -
Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling
by: Masuyama, Yoshiki, et al.
Published: (2026)