Saved in:
| Main Authors: | Cervera, Matthieu, Paissan, Francesco, Ravanelli, Mirco, Subakan, Cem |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.17219 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Listenable Maps for Audio Classifiers
by: Paissan, Francesco, et al.
Published: (2024)
by: Paissan, Francesco, et al.
Published: (2024)
Exploring Token-Space Manipulation in Latent Audio Tokenizers
by: Paissan, Francesco, et al.
Published: (2026)
by: Paissan, Francesco, et al.
Published: (2026)
LMAC-TD: Producing Time Domain Explanations for Audio Classifiers
by: Mancini, Eleonora, et al.
Published: (2024)
by: Mancini, Eleonora, et al.
Published: (2024)
Listenable Maps for Zero-Shot Audio Classifiers
by: Paissan, Francesco, et al.
Published: (2024)
by: Paissan, Francesco, et al.
Published: (2024)
Audio Editing with Non-Rigid Text Prompts
by: Paissan, Francesco, et al.
Published: (2023)
by: Paissan, Francesco, et al.
Published: (2023)
FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks
by: Della Libera, Luca, et al.
Published: (2025)
by: Della Libera, Luca, et al.
Published: (2025)
Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech
by: Mancini, Eleonora, et al.
Published: (2024)
by: Mancini, Eleonora, et al.
Published: (2024)
LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs
by: Mousavi, Pooneh, et al.
Published: (2025)
by: Mousavi, Pooneh, et al.
Published: (2025)
Beyond Fixed Frames: Dynamic Character-Aligned Speech Tokenization
by: Della Libera, Luca, et al.
Published: (2026)
by: Della Libera, Luca, et al.
Published: (2026)
Autoregressive Speech Enhancement via Acoustic Tokens
by: Della Libera, Luca, et al.
Published: (2025)
by: Della Libera, Luca, et al.
Published: (2025)
Focal Modulation Networks for Interpretable Sound Classification
by: Della Libera, Luca, et al.
Published: (2024)
by: Della Libera, Luca, et al.
Published: (2024)
WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation
by: Della Libera, Luca, et al.
Published: (2026)
by: Della Libera, Luca, et al.
Published: (2026)
FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation
by: Della Libera, Luca, et al.
Published: (2025)
by: Della Libera, Luca, et al.
Published: (2025)
Toward Faithful Explanations in Acoustic Anomaly Detection
by: Elrashid, Maab, et al.
Published: (2026)
by: Elrashid, Maab, et al.
Published: (2026)
Resource-Efficient Separation Transformer
by: Della Libera, Luca, et al.
Published: (2022)
by: Della Libera, Luca, et al.
Published: (2022)
Dynamic HumTrans: Humming Transcription Using CNNs and Dynamic Programming
by: Gupta, Shubham, et al.
Published: (2024)
by: Gupta, Shubham, et al.
Published: (2024)
Listen First, Then Answer: Timestamp-Grounded Speech Reasoning
by: Jeong, Jihoon, et al.
Published: (2026)
by: Jeong, Jihoon, et al.
Published: (2026)
Investigating Faithfulness in Large Audio Language Models
by: Mousavi, Pooneh, et al.
Published: (2025)
by: Mousavi, Pooneh, et al.
Published: (2025)
ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs
by: Mousavi, Pooneh, et al.
Published: (2025)
by: Mousavi, Pooneh, et al.
Published: (2025)
Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice
by: Gupta, Shubham, et al.
Published: (2024)
by: Gupta, Shubham, et al.
Published: (2024)
LL-SDR: Low-Latency Speech enhancement through Discrete Representations
by: Li, Jingyi, et al.
Published: (2026)
by: Li, Jingyi, et al.
Published: (2026)
tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models
by: Paissan, Francesco, et al.
Published: (2023)
by: Paissan, Francesco, et al.
Published: (2023)
Knowing When to Answer: Adaptive Confidence Refinement for Reliable Audio-Visual Question Answering
by: Tran, Dinh Phu, et al.
Published: (2026)
by: Tran, Dinh Phu, et al.
Published: (2026)
DASB - Discrete Audio and Speech Benchmark
by: Mousavi, Pooneh, et al.
Published: (2024)
by: Mousavi, Pooneh, et al.
Published: (2024)
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
by: Mousavi, Pooneh, et al.
Published: (2024)
by: Mousavi, Pooneh, et al.
Published: (2024)
SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning
by: Zampierin, Luca, et al.
Published: (2024)
by: Zampierin, Luca, et al.
Published: (2024)
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
by: Zaiem, Salah, et al.
Published: (2023)
by: Zaiem, Salah, et al.
Published: (2023)
Adaptation Odyssey in LLMs: Why Does Additional Pretraining Sometimes Fail to Improve?
by: Öncel, Fırat, et al.
Published: (2024)
by: Öncel, Fırat, et al.
Published: (2024)
Coherent Audio-Visual Editing via Conditional Audio Generation Following Video Edits
by: Ishii, Masato, et al.
Published: (2025)
by: Ishii, Masato, et al.
Published: (2025)
What Are They Doing? Joint Audio-Speech Co-Reasoning
by: Wang, Yingzhi, et al.
Published: (2024)
by: Wang, Yingzhi, et al.
Published: (2024)
SAO-Instruct: Free-form Audio Editing using Natural Language Instructions
by: Ungersböck, Michael, et al.
Published: (2025)
by: Ungersböck, Michael, et al.
Published: (2025)
Guiding Audio Editing with Audio Language Model
by: Lan, Zitong, et al.
Published: (2025)
by: Lan, Zitong, et al.
Published: (2025)
Audio-to-Image Bird Species Retrieval without Audio-Image Pairs via Text Distillation
by: Moummad, Ilyass, et al.
Published: (2026)
by: Moummad, Ilyass, et al.
Published: (2026)
Audio Simulation for Sound Source Localization in Virtual Evironment
by: Di Yuan, Yi, et al.
Published: (2024)
by: Di Yuan, Yi, et al.
Published: (2024)
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
by: Bai, Yatong, et al.
Published: (2023)
by: Bai, Yatong, et al.
Published: (2023)
Music2Latent: Consistency Autoencoders for Latent Audio Compression
by: Pasini, Marco, et al.
Published: (2024)
by: Pasini, Marco, et al.
Published: (2024)
Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
by: Manor, Hila, et al.
Published: (2024)
by: Manor, Hila, et al.
Published: (2024)
How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection
by: Xiao, Yixuan, et al.
Published: (2026)
by: Xiao, Yixuan, et al.
Published: (2026)
ADNAC: Audio Denoiser using Neural Audio Codec
by: Jimon, Daniel, et al.
Published: (2025)
by: Jimon, Daniel, et al.
Published: (2025)
Prompt-guided Precise Audio Editing with Diffusion Models
by: Xu, Manjie, et al.
Published: (2024)
by: Xu, Manjie, et al.
Published: (2024)
Similar Items
-
Listenable Maps for Audio Classifiers
by: Paissan, Francesco, et al.
Published: (2024) -
Exploring Token-Space Manipulation in Latent Audio Tokenizers
by: Paissan, Francesco, et al.
Published: (2026) -
LMAC-TD: Producing Time Domain Explanations for Audio Classifiers
by: Mancini, Eleonora, et al.
Published: (2024) -
Listenable Maps for Zero-Shot Audio Classifiers
by: Paissan, Francesco, et al.
Published: (2024) -
Audio Editing with Non-Rigid Text Prompts
by: Paissan, Francesco, et al.
Published: (2023)