Saved in:
| Main Authors: | Simionato, Riccardo, Fasciani, Stefano |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.12549 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Sines, Transient, Noise Neural Modeling of Piano Notes
by: Simionato, Riccardo, et al.
Published: (2024)
by: Simionato, Riccardo, et al.
Published: (2024)
SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation
by: Mu, Da, et al.
Published: (2024)
by: Mu, Da, et al.
Published: (2024)
Exploring State-Space-Model based Language Model in Music Generation
by: Lee, Wei-Jaw, et al.
Published: (2025)
by: Lee, Wei-Jaw, et al.
Published: (2025)
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
by: Yadav, Sarthak, et al.
Published: (2024)
by: Yadav, Sarthak, et al.
Published: (2024)
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
by: Erol, Mehmet Hamza, et al.
Published: (2024)
by: Erol, Mehmet Hamza, et al.
Published: (2024)
Audio Mamba: Pretrained Audio State Space Model For Audio Tagging
by: Lin, Jiaju, et al.
Published: (2024)
by: Lin, Jiaju, et al.
Published: (2024)
Comparative Study of State-based Neural Networks for Virtual Analog Audio Effects Modeling
by: Simionato, Riccardo, et al.
Published: (2024)
by: Simionato, Riccardo, et al.
Published: (2024)
SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering
by: Yang, Zhe, et al.
Published: (2024)
by: Yang, Zhe, et al.
Published: (2024)
Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
by: Passoni, Riccardo, et al.
Published: (2025)
by: Passoni, Riccardo, et al.
Published: (2025)
DiffusionRIR: Room Impulse Response Interpolation using Diffusion Models
by: Della Torre, Sagi, et al.
Published: (2025)
by: Della Torre, Sagi, et al.
Published: (2025)
Estimating Musical Surprisal from Audio in Autoregressive Diffusion Model Noise Spaces
by: Bjare, Mathias Rose, et al.
Published: (2025)
by: Bjare, Mathias Rose, et al.
Published: (2025)
Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation
by: Chen, Guo, et al.
Published: (2025)
by: Chen, Guo, et al.
Published: (2025)
ES4R: Speech Encoding Based on Prepositive Affective Modeling for Empathetic Response Generation
by: Gao, Zhuoyue, et al.
Published: (2026)
by: Gao, Zhuoyue, et al.
Published: (2026)
Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models
by: Shakhadri, Syed Abdul Gaffar, et al.
Published: (2025)
by: Shakhadri, Syed Abdul Gaffar, et al.
Published: (2025)
Notochord: a Flexible Probabilistic Model for Real-Time MIDI Performance
by: Shepardson, Victor, et al.
Published: (2024)
by: Shepardson, Victor, et al.
Published: (2024)
Decoding Ambiguous Emotions with Test-Time Scaling in Audio-Language Models
by: Jia, Hong, et al.
Published: (2026)
by: Jia, Hong, et al.
Published: (2026)
Scaling Auditory Cognition via Test-Time Compute in Audio Language Models
by: Dang, Ting, et al.
Published: (2025)
by: Dang, Ting, et al.
Published: (2025)
Selective Attention System (SAS): Device-Addressed Speech Detection for Real-Time On-Device Voice AI
by: Kim, David Joohun, et al.
Published: (2026)
by: Kim, David Joohun, et al.
Published: (2026)
A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation
by: Wang, Jingyuan, et al.
Published: (2024)
by: Wang, Jingyuan, et al.
Published: (2024)
Differentiable Time-Varying IIR Filtering for Real-Time Speech Denoising
by: Rota, Riccardo, et al.
Published: (2026)
by: Rota, Riccardo, et al.
Published: (2026)
Music Consistency Models
by: Fei, Zhengcong, et al.
Published: (2024)
by: Fei, Zhengcong, et al.
Published: (2024)
SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces
by: Vallés-Pérez, Ivan, et al.
Published: (2023)
by: Vallés-Pérez, Ivan, et al.
Published: (2023)
Parameter Selection for Analyzing Conversations with Autism Spectrum Disorder
by: Chowdhury, Tahiya, et al.
Published: (2024)
by: Chowdhury, Tahiya, et al.
Published: (2024)
Deep Space Separable Distillation for Lightweight Acoustic Scene Classification
by: Ye, ShuQi, et al.
Published: (2024)
by: Ye, ShuQi, et al.
Published: (2024)
Selective Classifier-free Guidance for Zero-shot Text-to-speech
by: Zheng, John, et al.
Published: (2025)
by: Zheng, John, et al.
Published: (2025)
A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention Mechanism for Symbolic Music Modeling
by: Guo, Z., et al.
Published: (2022)
by: Guo, Z., et al.
Published: (2022)
ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings
by: Mariotte, Theo, et al.
Published: (2024)
by: Mariotte, Theo, et al.
Published: (2024)
Diff-V2M: A Hierarchical Conditional Diffusion Model with Explicit Rhythmic Modeling for Video-to-Music Generation
by: Ji, Shulei, et al.
Published: (2025)
by: Ji, Shulei, et al.
Published: (2025)
The Interpretation Gap in Text-to-Music Generation Models
by: Zang, Yongyi, et al.
Published: (2024)
by: Zang, Yongyi, et al.
Published: (2024)
Certification of Speaker Recognition Models to Additive Perturbations
by: Korzh, Dmitrii, et al.
Published: (2024)
by: Korzh, Dmitrii, et al.
Published: (2024)
Audio Explanation Synthesis with Generative Foundation Models
by: Akman, Alican, et al.
Published: (2024)
by: Akman, Alican, et al.
Published: (2024)
Abstract Sound Fusion with Unconditional Inversion Models
by: Liu, Jing, et al.
Published: (2025)
by: Liu, Jing, et al.
Published: (2025)
Adaptive Duration Model for Text Speech Alignment
by: Cao, Junjie
Published: (2025)
by: Cao, Junjie
Published: (2025)
TISDiSS: A Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation
by: Feng, Yongsheng, et al.
Published: (2025)
by: Feng, Yongsheng, et al.
Published: (2025)
Leveraging Mixture of Experts for Improved Speech Deepfake Detection
by: Negroni, Viola, et al.
Published: (2024)
by: Negroni, Viola, et al.
Published: (2024)
Structuring Concept Space with the Musical Circle of Fifths by Utilizing Music Grammar Based Activations
by: Moyo, Tofara, et al.
Published: (2024)
by: Moyo, Tofara, et al.
Published: (2024)
I Can Hear You: Selective Robust Training for Deepfake Audio Detection
by: Zhang, Zirui, et al.
Published: (2024)
by: Zhang, Zirui, et al.
Published: (2024)
AND: Audio Network Dissection for Interpreting Deep Acoustic Models
by: Wu, Tung-Yu, et al.
Published: (2024)
by: Wu, Tung-Yu, et al.
Published: (2024)
ASD-Diffusion: Anomalous Sound Detection with Diffusion Models
by: Zhang, Fengrun, et al.
Published: (2024)
by: Zhang, Fengrun, et al.
Published: (2024)
FoleyBench: A Benchmark For Video-to-Audio Models
by: Dixit, Satvik, et al.
Published: (2025)
by: Dixit, Satvik, et al.
Published: (2025)
Similar Items
-
Sines, Transient, Noise Neural Modeling of Piano Notes
by: Simionato, Riccardo, et al.
Published: (2024) -
SELD-Mamba: Selective State-Space Model for Sound Event Localization and Detection with Source Distance Estimation
by: Mu, Da, et al.
Published: (2024) -
Exploring State-Space-Model based Language Model in Music Generation
by: Lee, Wei-Jaw, et al.
Published: (2025) -
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
by: Yadav, Sarthak, et al.
Published: (2024) -
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
by: Erol, Mehmet Hamza, et al.
Published: (2024)