:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Aite, Liu, Yongcan, Yu, Xinglin, Xing, Xinyue
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Sound
Online Access:	https://arxiv.org/abs/2502.10703
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SleepGMUformer: A gated multimodal temporal neural network for sleep staging
by: Zhao, Chenjun, et al.
Published: (2025)

Synthetic data enables context-aware bioacoustic sound event detection
by: Hoffman, Benjamin, et al.
Published: (2025)

Determining the severity of Parkinson's disease in patients using a multi task neural network
by: García-Ordás, María Teresa, et al.
Published: (2024)

Optimising MFCC parameters for the automatic detection of respiratory diseases
by: Yan, Yuyang, et al.
Published: (2024)

Evaluating Echo State Network for Parkinson's Disease Prediction using Voice Features
by: Hosseininian, Seyedeh Zahra Seyedi, et al.
Published: (2024)

A multimodal Bayesian Network for symptom-level depression and anxiety prediction from voice and speech data
by: Norbury, Agnes, et al.
Published: (2025)

Acoustic evaluation of a neural network dedicated to the detection of animal vocalisations
by: Rouch, Jérémy, et al.
Published: (2025)

An AI-enabled Bias-Free Respiratory Disease Diagnosis Model using Cough Audio: A Case Study for COVID-19
by: Saeed, Tabish, et al.
Published: (2024)

A multimodal dynamical variational autoencoder for audiovisual speech representation learning
by: Sadok, Samir, et al.
Published: (2023)

Robust detection of overlapping bioacoustic sound events
by: Mahon, Louis, et al.
Published: (2025)

Speech foundation models on intelligibility prediction for hearing-impaired listeners
by: Cuervo, Santiago, et al.
Published: (2024)

Adaptive vector steering: A training-free, layer-wise intervention for hallucination mitigation in large audio and multimodal models
by: Lin, Tsung-En, et al.
Published: (2025)

Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning
by: Wu, Daiqing, et al.
Published: (2026)

Sparse deepfake detection promotes better disentanglement
by: Teissier, Antoine, et al.
Published: (2025)

SpikCommander: A High-performance Spiking Transformer with Multi-view Learning for Efficient Speech Command Recognition
by: Wang, Jiaqi, et al.
Published: (2025)

A Semi-Supervised Framework for Speech Confidence Detection using Whisper
by: Wynn, Adam, et al.
Published: (2026)

An Attention Long Short-Term Memory based system for automatic classification of speech intelligibility
by: Fernández-Díaz, Miguel, et al.
Published: (2024)

A contrastive-learning approach for auditory attention detection
by: Bajestan, Seyed Ali Alavi, et al.
Published: (2024)

Decodable but not structured: linear probing enables Underwater Acoustic Target Recognition with pretrained audio embeddings
by: Hummel, Hilde I., et al.
Published: (2026)

BenSParX: A Robust Explainable Machine Learning Framework for Parkinson's Disease Detection from Bengali Conversational Speech
by: Hossain, Riad, et al.
Published: (2025)

ADNAC: Audio Denoiser using Neural Audio Codec
by: Jimon, Daniel, et al.
Published: (2025)

Selfsupervised learning for pathological speech detection
by: Sheikh, Shakeel Ahmad
Published: (2024)

High-Fidelity Music Vocoder using Neural Audio Codecs
by: Lanzendörfer, Luca A., et al.
Published: (2025)

Efficient Continual Learning in Keyword Spotting using Binary Neural Networks
by: Vu, Quynh Nguyen-Phuong, et al.
Published: (2025)

Denoising by neural network for muzzle blast detection
by: Pujol, Hadrien, et al.
Published: (2025)

Cough activity detection for automatic tuberculosis screening
by: van Vüren, Joshua Jansen, et al.
Published: (2026)

SAO-Instruct: Free-form Audio Editing using Natural Language Instructions
by: Ungersböck, Michael, et al.
Published: (2025)

Multi-Task Learning for Lung sound & Lung disease classification
by: K V, Suma, et al.
Published: (2024)

Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech
by: Mancini, Eleonora, et al.
Published: (2024)

voice2mode: Phonation Mode Classification in Singing using Self-Supervised Speech Models
by: Justus, Aju Ani, et al.
Published: (2026)

Towards generalizing deep-audio fake detection networks
by: Gasenzer, Konstantin, et al.
Published: (2023)

A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes
by: Ronchini, Francesca, et al.
Published: (2022)

Surface impedance inference via neural fields and sparse acoustic data obtained by a compact array
by: Xia, Yuanxin, et al.
Published: (2026)

Unsupervised outlier detection to improve bird audio dataset labels
by: Collins, Bruce
Published: (2025)

Learning to rumble: Automated elephant call classification, detection and endpointing using deep architectures
by: Geldenhuys, Christiaan M., et al.
Published: (2024)

A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings
by: Adnan, Tariq, et al.
Published: (2024)

Unleashing the Power of Natural Audio Featuring Multiple Sound Sources
by: Cheng, Xize, et al.
Published: (2025)

Generalizable speech deepfake detection via meta-learned LoRA
by: Laakkonen, Janne, et al.
Published: (2025)

The impact of non-target events in synthetic soundscapes for sound event detection
by: Ronchini, Francesca, et al.
Published: (2021)

Voxceleb-ESP: preliminary experiments detecting Spanish celebrities from their voices
by: Labrador, Beltrán, et al.
Published: (2023)