:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Simionato, Riccardo, Fasciani, Stefano
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2408.12549
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Sines, Transient, Noise Neural Modeling of Piano Notes
by: Simionato, Riccardo, et al.
Published: (2024)

SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation
by: Mu, Da, et al.
Published: (2024)

Exploring State-Space-Model based Language Model in Music Generation
by: Lee, Wei-Jaw, et al.
Published: (2025)

Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
by: Yadav, Sarthak, et al.
Published: (2024)

Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
by: Erol, Mehmet Hamza, et al.
Published: (2024)

Audio Mamba: Pretrained Audio State Space Model For Audio Tagging
by: Lin, Jiaju, et al.
Published: (2024)

Comparative Study of State-based Neural Networks for Virtual Analog Audio Effects Modeling
by: Simionato, Riccardo, et al.
Published: (2024)

SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering
by: Yang, Zhe, et al.
Published: (2024)

Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
by: Passoni, Riccardo, et al.
Published: (2025)

DiffusionRIR: Room Impulse Response Interpolation using Diffusion Models
by: Della Torre, Sagi, et al.
Published: (2025)

Estimating Musical Surprisal from Audio in Autoregressive Diffusion Model Noise Spaces
by: Bjare, Mathias Rose, et al.
Published: (2025)

Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation
by: Chen, Guo, et al.
Published: (2025)

ES4R: Speech Encoding Based on Prepositive Affective Modeling for Empathetic Response Generation
by: Gao, Zhuoyue, et al.
Published: (2026)

Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models
by: Shakhadri, Syed Abdul Gaffar, et al.
Published: (2025)

Notochord: a Flexible Probabilistic Model for Real-Time MIDI Performance
by: Shepardson, Victor, et al.
Published: (2024)

Decoding Ambiguous Emotions with Test-Time Scaling in Audio-Language Models
by: Jia, Hong, et al.
Published: (2026)

Scaling Auditory Cognition via Test-Time Compute in Audio Language Models
by: Dang, Ting, et al.
Published: (2025)

Selective Attention System (SAS): Device-Addressed Speech Detection for Real-Time On-Device Voice AI
by: Kim, David Joohun, et al.
Published: (2026)

A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation
by: Wang, Jingyuan, et al.
Published: (2024)

Differentiable Time-Varying IIR Filtering for Real-Time Speech Denoising
by: Rota, Riccardo, et al.
Published: (2026)

Music Consistency Models
by: Fei, Zhengcong, et al.
Published: (2024)

SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces
by: Vallés-Pérez, Ivan, et al.
Published: (2023)

Parameter Selection for Analyzing Conversations with Autism Spectrum Disorder
by: Chowdhury, Tahiya, et al.
Published: (2024)

Deep Space Separable Distillation for Lightweight Acoustic Scene Classification
by: Ye, ShuQi, et al.
Published: (2024)

Selective Classifier-free Guidance for Zero-shot Text-to-speech
by: Zheng, John, et al.
Published: (2025)

A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention Mechanism for Symbolic Music Modeling
by: Guo, Z., et al.
Published: (2022)

ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings
by: Mariotte, Theo, et al.
Published: (2024)

Diff-V2M: A Hierarchical Conditional Diffusion Model with Explicit Rhythmic Modeling for Video-to-Music Generation
by: Ji, Shulei, et al.
Published: (2025)

The Interpretation Gap in Text-to-Music Generation Models
by: Zang, Yongyi, et al.
Published: (2024)

Certification of Speaker Recognition Models to Additive Perturbations
by: Korzh, Dmitrii, et al.
Published: (2024)

Audio Explanation Synthesis with Generative Foundation Models
by: Akman, Alican, et al.
Published: (2024)

Abstract Sound Fusion with Unconditional Inversion Models
by: Liu, Jing, et al.
Published: (2025)

Adaptive Duration Model for Text Speech Alignment
by: Cao, Junjie
Published: (2025)

TISDiSS: A Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation
by: Feng, Yongsheng, et al.
Published: (2025)

Leveraging Mixture of Experts for Improved Speech Deepfake Detection
by: Negroni, Viola, et al.
Published: (2024)

Structuring Concept Space with the Musical Circle of Fifths by Utilizing Music Grammar Based Activations
by: Moyo, Tofara, et al.
Published: (2024)

I Can Hear You: Selective Robust Training for Deepfake Audio Detection
by: Zhang, Zirui, et al.
Published: (2024)

AND: Audio Network Dissection for Interpreting Deep Acoustic Models
by: Wu, Tung-Yu, et al.
Published: (2024)

ASD-Diffusion: Anomalous Sound Detection with Diffusion Models
by: Zhang, Fengrun, et al.
Published: (2024)

FoleyBench: A Benchmark For Video-to-Audio Models
by: Dixit, Satvik, et al.
Published: (2025)