:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gupta, Shubham, Gomez-Sarmiento, Isaac Neri, Mezdari, Faez Amjed, Ravanelli, Mirco, Subakan, Cem
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2410.05455
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice
by: Gupta, Shubham, et al.
Published: (2024)

LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs
by: Mousavi, Pooneh, et al.
Published: (2025)

Listen First, Then Answer: Timestamp-Grounded Speech Reasoning
by: Jeong, Jihoon, et al.
Published: (2026)

Listenable Maps for Audio Classifiers
by: Paissan, Francesco, et al.
Published: (2024)

Autoregressive Speech Enhancement via Acoustic Tokens
by: Della Libera, Luca, et al.
Published: (2025)

Focal Modulation Networks for Interpretable Sound Classification
by: Della Libera, Luca, et al.
Published: (2024)

LL-SDR: Low-Latency Speech enhancement through Discrete Representations
by: Li, Jingyi, et al.
Published: (2026)

ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs
by: Mousavi, Pooneh, et al.
Published: (2025)

FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation
by: Della Libera, Luca, et al.
Published: (2025)

Listenable Maps for Zero-Shot Audio Classifiers
by: Paissan, Francesco, et al.
Published: (2024)

LMAC-TD: Producing Time Domain Explanations for Audio Classifiers
by: Mancini, Eleonora, et al.
Published: (2024)

FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks
by: Della Libera, Luca, et al.
Published: (2025)

Toward Faithful Explanations in Acoustic Anomaly Detection
by: Elrashid, Maab, et al.
Published: (2026)

Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech
by: Mancini, Eleonora, et al.
Published: (2024)

Audio Editing with Non-Rigid Text Prompts
by: Paissan, Francesco, et al.
Published: (2023)

Resource-Efficient Separation Transformer
by: Della Libera, Luca, et al.
Published: (2022)

DASB - Discrete Audio and Speech Benchmark
by: Mousavi, Pooneh, et al.
Published: (2024)

HumDial-EIBench: A Human-Recorded Multi-Turn Emotional Intelligence Benchmark for Audio Language Models
by: Wang, Shuiyuan, et al.
Published: (2026)

How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
by: Mousavi, Pooneh, et al.
Published: (2024)

Investigating Faithfulness in Large Audio Language Models
by: Mousavi, Pooneh, et al.
Published: (2025)

Audio Prototypical Network For Controllable Music Recommendation
by: Öncel, Fırat, et al.
Published: (2025)

ProGRes: Prompted Generative Rescoring on ASR n-Best
by: Tur, Ada Defne, et al.
Published: (2024)

What Are They Doing? Joint Audio-Speech Co-Reasoning
by: Wang, Yingzhi, et al.
Published: (2024)

The ICASSP 2026 HumDial Challenge: Benchmarking Human-like Spoken Dialogue Systems in the LLM Era
by: Zhao, Zhixian, et al.
Published: (2026)

Planing It by Ear: Convolutional Neural Networks for Acoustic Anomaly Detection in Industrial Wood Planers
by: Deschênes, Anthony, et al.
Published: (2025)

Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
by: Zaiem, Salah, et al.
Published: (2023)

SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning
by: Zampierin, Luca, et al.
Published: (2024)

Comparison of Speech Tasks in Human Expert and Machine Detection of Parkinson's Disease
by: Plantinga, Peter, et al.
Published: (2025)

CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds
by: Budaghyan, David, et al.
Published: (2023)

mmWave-Whisper: Phone Call Eavesdropping and Transcription Using Millimeter-Wave Radar
by: Basak, Suryoday, et al.
Published: (2024)

Is Transfer Learning Necessary for Violin Transcription?
by: Peng, Yueh-Po, et al.
Published: (2025)

Low-Rank Adaptation of Deep Prior Neural Networks For Room Impulse Response Reconstruction
by: Pezzoli, Mirco, et al.
Published: (2025)

Acoustic source localization in the spherical harmonics domain exploiting low-rank approximations
by: Cobos, Maximo, et al.
Published: (2023)

Phase-Retrieval-Based Physics-Informed Neural Networks For Acoustic Magnitude Field Reconstruction
by: Schrader, Karl, et al.
Published: (2026)

Physics-Informed Transfer Learning for Data-Driven Sound Source Reconstruction in Near-Field Acoustic Holography
by: Luan, Xinmeng, et al.
Published: (2025)

Discrete Audio Tokens: More Than a Survey!
by: Mousavi, Pooneh, et al.
Published: (2025)

Prompting Whisper for Joint Speech Transcription and Diarization
by: Zamyrova, Mariia, et al.
Published: (2026)

Robust Singing Voice Transcription Serves Synthesis
by: Li, Ruiqi, et al.
Published: (2024)

TF-SepNet: An Efficient 1D Kernel Design in CNNs for Low-Complexity Acoustic Scene Classification
by: Cai, Yiqiang, et al.
Published: (2023)

Enhancing Lyrics Transcription on Music Mixtures with Consistency Loss
by: Huang, Jiawen, et al.
Published: (2025)