Saved in:
| Main Authors: | Aihara, Ryo, Masuyama, Yoshiki, Wichern, Gordon, Germain, François G., Roux, Jonathan Le |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.08399 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SUNAC: Source-aware Unified Neural Audio Codec
by: Aihara, Ryo, et al.
Published: (2025)
by: Aihara, Ryo, et al.
Published: (2025)
FasTUSS: Faster Task-Aware Unified Source Separation
by: Paissan, Francesco, et al.
Published: (2025)
by: Paissan, Francesco, et al.
Published: (2025)
Physics-Informed Direction-Aware Neural Acoustic Fields
by: Masuyama, Yoshiki, et al.
Published: (2025)
by: Masuyama, Yoshiki, et al.
Published: (2025)
Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling
by: Masuyama, Yoshiki, et al.
Published: (2026)
by: Masuyama, Yoshiki, et al.
Published: (2026)
FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement
by: Masuyama, Yoshiki, et al.
Published: (2025)
by: Masuyama, Yoshiki, et al.
Published: (2025)
Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization
by: Masuyama, Yoshiki, et al.
Published: (2025)
by: Masuyama, Yoshiki, et al.
Published: (2025)
NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization
by: Masuyama, Yoshiki, et al.
Published: (2024)
by: Masuyama, Yoshiki, et al.
Published: (2024)
HASRD: Hierarchical Acoustic and Semantic Representation Disentanglement
by: Hussein, Amir, et al.
Published: (2025)
by: Hussein, Amir, et al.
Published: (2025)
Enhanced Reverberation as Supervision for Unsupervised Speech Separation
by: Saijo, Kohei, et al.
Published: (2024)
by: Saijo, Kohei, et al.
Published: (2024)
Direction-Aware Neural Acoustic Fields for Few-Shot Interpolation of Ambisonic Impulse Responses
by: Ick, Christopher, et al.
Published: (2025)
by: Ick, Christopher, et al.
Published: (2025)
Data Augmentation Using Neural Acoustic Fields With Retrieval-Augmented Pre-training
by: Ick, Christopher, et al.
Published: (2025)
by: Ick, Christopher, et al.
Published: (2025)
Factorized RVQ-GAN For Disentangled Speech Tokenization
by: Khurana, Sameer, et al.
Published: (2025)
by: Khurana, Sameer, et al.
Published: (2025)
Predictive-Generative Drift Decomposition for Speech Enhancement and Separation
by: Richter, Julius, et al.
Published: (2026)
by: Richter, Julius, et al.
Published: (2026)
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement
by: Saijo, Kohei, et al.
Published: (2024)
by: Saijo, Kohei, et al.
Published: (2024)
Why does music source separation benefit from cacophony?
by: Jeon, Chang-Bin, et al.
Published: (2024)
by: Jeon, Chang-Bin, et al.
Published: (2024)
Sound Event Bounding Boxes
by: Ebbers, Janek, et al.
Published: (2024)
by: Ebbers, Janek, et al.
Published: (2024)
SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
by: Koo, Junghyun, et al.
Published: (2024)
by: Koo, Junghyun, et al.
Published: (2024)
Generic Speech Enhancement with Self-Supervised Representation Space Loss
by: Sato, Hiroshi, et al.
Published: (2025)
by: Sato, Hiroshi, et al.
Published: (2025)
Task-Aware Unified Source Separation
by: Saijo, Kohei, et al.
Published: (2024)
by: Saijo, Kohei, et al.
Published: (2024)
Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec
by: Ren, Yanzhou, et al.
Published: (2026)
by: Ren, Yanzhou, et al.
Published: (2026)
Speech dereverberation constrained on room impulse response characteristics
by: Bahrman, Louis, et al.
Published: (2024)
by: Bahrman, Louis, et al.
Published: (2024)
Local Density-Based Anomaly Score Normalization for Domain Generalization
by: Wilkinghoff, Kevin, et al.
Published: (2025)
by: Wilkinghoff, Kevin, et al.
Published: (2025)
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
by: Saijo, Kohei, et al.
Published: (2024)
by: Saijo, Kohei, et al.
Published: (2024)
Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
by: Wu, Shih-Lun, et al.
Published: (2023)
by: Wu, Shih-Lun, et al.
Published: (2023)
Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning
by: Shi, Runwu, et al.
Published: (2024)
by: Shi, Runwu, et al.
Published: (2024)
Exploring the Capability of Mamba in Speech Applications
by: Miyazaki, Koichi, et al.
Published: (2024)
by: Miyazaki, Koichi, et al.
Published: (2024)
SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis
by: Baoueb, Teysir, et al.
Published: (2024)
by: Baoueb, Teysir, et al.
Published: (2024)
30+ Years of Source Separation Research: Achievements and Future Challenges
by: Araki, Shoko, et al.
Published: (2025)
by: Araki, Shoko, et al.
Published: (2025)
Neural Speech and Audio Coding: Modern AI Technology Meets Traditional Codecs
by: Kim, Minje, et al.
Published: (2024)
by: Kim, Minje, et al.
Published: (2024)
Relating the Neural Representations of Vocalized, Mimed, and Imagined Speech
by: Maghsoudi, Maryam, et al.
Published: (2026)
by: Maghsoudi, Maryam, et al.
Published: (2026)
Self-supervised Multimodal Speech Representations for the Assessment of Schizophrenia Symptoms
by: Premananth, Gowtham, et al.
Published: (2024)
by: Premananth, Gowtham, et al.
Published: (2024)
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
by: Zaiem, Salah, et al.
Published: (2023)
by: Zaiem, Salah, et al.
Published: (2023)
Codec2Vec: Self-Supervised Speech Representation Learning Using Neural Speech Codecs
by: Tseng, Wei-Cheng, et al.
Published: (2025)
by: Tseng, Wei-Cheng, et al.
Published: (2025)
Speaker and Style Disentanglement of Speech Based on Contrastive Predictive Coding Supported Factorized Variational Autoencoder
by: Xie, Yuying, et al.
Published: (2024)
by: Xie, Yuying, et al.
Published: (2024)
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
by: Masuyama, Yoshiki, et al.
Published: (2024)
by: Masuyama, Yoshiki, et al.
Published: (2024)
Mind the Gap: Detecting Cluster Exits for Robust Local Density-Based Score Normalization in Anomalous Sound Detection
by: Wilkinghoff, Kevin, et al.
Published: (2026)
by: Wilkinghoff, Kevin, et al.
Published: (2026)
Perceptually-motivated Spatial Audio Codec for Higher-Order Ambisonics Compression
by: Hold, Christoph, et al.
Published: (2024)
by: Hold, Christoph, et al.
Published: (2024)
Spatially Aware Self-Supervised Models for Multi-Channel Neural Speaker Diarization
by: Han, Jiangyu, et al.
Published: (2025)
by: Han, Jiangyu, et al.
Published: (2025)
ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech
by: Shi, Jiatong, et al.
Published: (2024)
by: Shi, Jiatong, et al.
Published: (2024)
USDnet: Unsupervised Speech Dereverberation via Neural Forward Filtering
by: Wang, Zhong-Qiu
Published: (2024)
by: Wang, Zhong-Qiu
Published: (2024)
Similar Items
-
SUNAC: Source-aware Unified Neural Audio Codec
by: Aihara, Ryo, et al.
Published: (2025) -
FasTUSS: Faster Task-Aware Unified Source Separation
by: Paissan, Francesco, et al.
Published: (2025) -
Physics-Informed Direction-Aware Neural Acoustic Fields
by: Masuyama, Yoshiki, et al.
Published: (2025) -
Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling
by: Masuyama, Yoshiki, et al.
Published: (2026) -
FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement
by: Masuyama, Yoshiki, et al.
Published: (2025)