Saved in:
| Main Authors: | Gupta, Shubham, Gomez-Sarmiento, Isaac Neri, Mezdari, Faez Amjed, Ravanelli, Mirco, Subakan, Cem |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.05455 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice
by: Gupta, Shubham, et al.
Published: (2024)
by: Gupta, Shubham, et al.
Published: (2024)
LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs
by: Mousavi, Pooneh, et al.
Published: (2025)
by: Mousavi, Pooneh, et al.
Published: (2025)
Listen First, Then Answer: Timestamp-Grounded Speech Reasoning
by: Jeong, Jihoon, et al.
Published: (2026)
by: Jeong, Jihoon, et al.
Published: (2026)
Listenable Maps for Audio Classifiers
by: Paissan, Francesco, et al.
Published: (2024)
by: Paissan, Francesco, et al.
Published: (2024)
Autoregressive Speech Enhancement via Acoustic Tokens
by: Della Libera, Luca, et al.
Published: (2025)
by: Della Libera, Luca, et al.
Published: (2025)
Focal Modulation Networks for Interpretable Sound Classification
by: Della Libera, Luca, et al.
Published: (2024)
by: Della Libera, Luca, et al.
Published: (2024)
LL-SDR: Low-Latency Speech enhancement through Discrete Representations
by: Li, Jingyi, et al.
Published: (2026)
by: Li, Jingyi, et al.
Published: (2026)
ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs
by: Mousavi, Pooneh, et al.
Published: (2025)
by: Mousavi, Pooneh, et al.
Published: (2025)
FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation
by: Della Libera, Luca, et al.
Published: (2025)
by: Della Libera, Luca, et al.
Published: (2025)
Listenable Maps for Zero-Shot Audio Classifiers
by: Paissan, Francesco, et al.
Published: (2024)
by: Paissan, Francesco, et al.
Published: (2024)
LMAC-TD: Producing Time Domain Explanations for Audio Classifiers
by: Mancini, Eleonora, et al.
Published: (2024)
by: Mancini, Eleonora, et al.
Published: (2024)
FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks
by: Della Libera, Luca, et al.
Published: (2025)
by: Della Libera, Luca, et al.
Published: (2025)
Toward Faithful Explanations in Acoustic Anomaly Detection
by: Elrashid, Maab, et al.
Published: (2026)
by: Elrashid, Maab, et al.
Published: (2026)
Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech
by: Mancini, Eleonora, et al.
Published: (2024)
by: Mancini, Eleonora, et al.
Published: (2024)
Audio Editing with Non-Rigid Text Prompts
by: Paissan, Francesco, et al.
Published: (2023)
by: Paissan, Francesco, et al.
Published: (2023)
Resource-Efficient Separation Transformer
by: Della Libera, Luca, et al.
Published: (2022)
by: Della Libera, Luca, et al.
Published: (2022)
DASB - Discrete Audio and Speech Benchmark
by: Mousavi, Pooneh, et al.
Published: (2024)
by: Mousavi, Pooneh, et al.
Published: (2024)
HumDial-EIBench: A Human-Recorded Multi-Turn Emotional Intelligence Benchmark for Audio Language Models
by: Wang, Shuiyuan, et al.
Published: (2026)
by: Wang, Shuiyuan, et al.
Published: (2026)
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
by: Mousavi, Pooneh, et al.
Published: (2024)
by: Mousavi, Pooneh, et al.
Published: (2024)
Investigating Faithfulness in Large Audio Language Models
by: Mousavi, Pooneh, et al.
Published: (2025)
by: Mousavi, Pooneh, et al.
Published: (2025)
Audio Prototypical Network For Controllable Music Recommendation
by: Öncel, Fırat, et al.
Published: (2025)
by: Öncel, Fırat, et al.
Published: (2025)
ProGRes: Prompted Generative Rescoring on ASR n-Best
by: Tur, Ada Defne, et al.
Published: (2024)
by: Tur, Ada Defne, et al.
Published: (2024)
What Are They Doing? Joint Audio-Speech Co-Reasoning
by: Wang, Yingzhi, et al.
Published: (2024)
by: Wang, Yingzhi, et al.
Published: (2024)
The ICASSP 2026 HumDial Challenge: Benchmarking Human-like Spoken Dialogue Systems in the LLM Era
by: Zhao, Zhixian, et al.
Published: (2026)
by: Zhao, Zhixian, et al.
Published: (2026)
Planing It by Ear: Convolutional Neural Networks for Acoustic Anomaly Detection in Industrial Wood Planers
by: Deschênes, Anthony, et al.
Published: (2025)
by: Deschênes, Anthony, et al.
Published: (2025)
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
by: Zaiem, Salah, et al.
Published: (2023)
by: Zaiem, Salah, et al.
Published: (2023)
SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning
by: Zampierin, Luca, et al.
Published: (2024)
by: Zampierin, Luca, et al.
Published: (2024)
Comparison of Speech Tasks in Human Expert and Machine Detection of Parkinson's Disease
by: Plantinga, Peter, et al.
Published: (2025)
by: Plantinga, Peter, et al.
Published: (2025)
CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds
by: Budaghyan, David, et al.
Published: (2023)
by: Budaghyan, David, et al.
Published: (2023)
mmWave-Whisper: Phone Call Eavesdropping and Transcription Using Millimeter-Wave Radar
by: Basak, Suryoday, et al.
Published: (2024)
by: Basak, Suryoday, et al.
Published: (2024)
Is Transfer Learning Necessary for Violin Transcription?
by: Peng, Yueh-Po, et al.
Published: (2025)
by: Peng, Yueh-Po, et al.
Published: (2025)
Low-Rank Adaptation of Deep Prior Neural Networks For Room Impulse Response Reconstruction
by: Pezzoli, Mirco, et al.
Published: (2025)
by: Pezzoli, Mirco, et al.
Published: (2025)
Acoustic source localization in the spherical harmonics domain exploiting low-rank approximations
by: Cobos, Maximo, et al.
Published: (2023)
by: Cobos, Maximo, et al.
Published: (2023)
Phase-Retrieval-Based Physics-Informed Neural Networks For Acoustic Magnitude Field Reconstruction
by: Schrader, Karl, et al.
Published: (2026)
by: Schrader, Karl, et al.
Published: (2026)
Physics-Informed Transfer Learning for Data-Driven Sound Source Reconstruction in Near-Field Acoustic Holography
by: Luan, Xinmeng, et al.
Published: (2025)
by: Luan, Xinmeng, et al.
Published: (2025)
Discrete Audio Tokens: More Than a Survey!
by: Mousavi, Pooneh, et al.
Published: (2025)
by: Mousavi, Pooneh, et al.
Published: (2025)
Prompting Whisper for Joint Speech Transcription and Diarization
by: Zamyrova, Mariia, et al.
Published: (2026)
by: Zamyrova, Mariia, et al.
Published: (2026)
Robust Singing Voice Transcription Serves Synthesis
by: Li, Ruiqi, et al.
Published: (2024)
by: Li, Ruiqi, et al.
Published: (2024)
TF-SepNet: An Efficient 1D Kernel Design in CNNs for Low-Complexity Acoustic Scene Classification
by: Cai, Yiqiang, et al.
Published: (2023)
by: Cai, Yiqiang, et al.
Published: (2023)
Enhancing Lyrics Transcription on Music Mixtures with Consistency Loss
by: Huang, Jiawen, et al.
Published: (2025)
by: Huang, Jiawen, et al.
Published: (2025)
Similar Items
-
Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice
by: Gupta, Shubham, et al.
Published: (2024) -
LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs
by: Mousavi, Pooneh, et al.
Published: (2025) -
Listen First, Then Answer: Timestamp-Grounded Speech Reasoning
by: Jeong, Jihoon, et al.
Published: (2026) -
Listenable Maps for Audio Classifiers
by: Paissan, Francesco, et al.
Published: (2024) -
Autoregressive Speech Enhancement via Acoustic Tokens
by: Della Libera, Luca, et al.
Published: (2025)