:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Aihara, Ryo, Masuyama, Yoshiki, Wichern, Gordon, Germain, François G., Roux, Jonathan Le
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Signal Processing
Online Access:	https://arxiv.org/abs/2508.08399
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SUNAC: Source-aware Unified Neural Audio Codec
by: Aihara, Ryo, et al.
Published: (2025)

FasTUSS: Faster Task-Aware Unified Source Separation
by: Paissan, Francesco, et al.
Published: (2025)

Physics-Informed Direction-Aware Neural Acoustic Fields
by: Masuyama, Yoshiki, et al.
Published: (2025)

Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling
by: Masuyama, Yoshiki, et al.
Published: (2026)

FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement
by: Masuyama, Yoshiki, et al.
Published: (2025)

Retrieval-Augmented Neural Field for HRTF Upsampling and Personalization
by: Masuyama, Yoshiki, et al.
Published: (2025)

NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization
by: Masuyama, Yoshiki, et al.
Published: (2024)

HASRD: Hierarchical Acoustic and Semantic Representation Disentanglement
by: Hussein, Amir, et al.
Published: (2025)

Enhanced Reverberation as Supervision for Unsupervised Speech Separation
by: Saijo, Kohei, et al.
Published: (2024)

Direction-Aware Neural Acoustic Fields for Few-Shot Interpolation of Ambisonic Impulse Responses
by: Ick, Christopher, et al.
Published: (2025)

Data Augmentation Using Neural Acoustic Fields With Retrieval-Augmented Pre-training
by: Ick, Christopher, et al.
Published: (2025)

Factorized RVQ-GAN For Disentangled Speech Tokenization
by: Khurana, Sameer, et al.
Published: (2025)

Predictive-Generative Drift Decomposition for Speech Enhancement and Separation
by: Richter, Julius, et al.
Published: (2026)

TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement
by: Saijo, Kohei, et al.
Published: (2024)

Why does music source separation benefit from cacophony?
by: Jeon, Chang-Bin, et al.
Published: (2024)

Sound Event Bounding Boxes
by: Ebbers, Janek, et al.
Published: (2024)

SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
by: Koo, Junghyun, et al.
Published: (2024)

Generic Speech Enhancement with Self-Supervised Representation Space Loss
by: Sato, Hiroshi, et al.
Published: (2025)

Task-Aware Unified Source Separation
by: Saijo, Kohei, et al.
Published: (2024)

Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec
by: Ren, Yanzhou, et al.
Published: (2026)

Speech dereverberation constrained on room impulse response characteristics
by: Bahrman, Louis, et al.
Published: (2024)

Local Density-Based Anomaly Score Normalization for Domain Generalization
by: Wilkinghoff, Kevin, et al.
Published: (2025)

Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
by: Saijo, Kohei, et al.
Published: (2024)

Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
by: Wu, Shih-Lun, et al.
Published: (2023)

Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning
by: Shi, Runwu, et al.
Published: (2024)

Exploring the Capability of Mamba in Speech Applications
by: Miyazaki, Koichi, et al.
Published: (2024)

SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis
by: Baoueb, Teysir, et al.
Published: (2024)

30+ Years of Source Separation Research: Achievements and Future Challenges
by: Araki, Shoko, et al.
Published: (2025)

Neural Speech and Audio Coding: Modern AI Technology Meets Traditional Codecs
by: Kim, Minje, et al.
Published: (2024)

Relating the Neural Representations of Vocalized, Mimed, and Imagined Speech
by: Maghsoudi, Maryam, et al.
Published: (2026)

Self-supervised Multimodal Speech Representations for the Assessment of Schizophrenia Symptoms
by: Premananth, Gowtham, et al.
Published: (2024)

Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
by: Zaiem, Salah, et al.
Published: (2023)

Codec2Vec: Self-Supervised Speech Representation Learning Using Neural Speech Codecs
by: Tseng, Wei-Cheng, et al.
Published: (2025)

Speaker and Style Disentanglement of Speech Based on Contrastive Predictive Coding Supported Factorized Variational Autoencoder
by: Xie, Yuying, et al.
Published: (2024)

Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
by: Masuyama, Yoshiki, et al.
Published: (2024)

Mind the Gap: Detecting Cluster Exits for Robust Local Density-Based Score Normalization in Anomalous Sound Detection
by: Wilkinghoff, Kevin, et al.
Published: (2026)

Perceptually-motivated Spatial Audio Codec for Higher-Order Ambisonics Compression
by: Hold, Christoph, et al.
Published: (2024)

Spatially Aware Self-Supervised Models for Multi-Channel Neural Speaker Diarization
by: Han, Jiangyu, et al.
Published: (2025)

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech
by: Shi, Jiatong, et al.
Published: (2024)

USDnet: Unsupervised Speech Dereverberation via Neural Forward Filtering
by: Wang, Zhong-Qiu
Published: (2024)