:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Impraimakis, Marios, Smyth, Andrew W.
Format:	Preprint
Published:	2025
Subjects:	Signal Processing Artificial Intelligence Computer Vision and Pattern Recognition Systems and Control Audio and Speech Processing 68T05 (Learning and adaptive systems) I.2.6; I.2.8
Online Access:	https://arxiv.org/abs/2511.02717
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Kullback-Leibler divergence method for input-system-state identification
by: Impraimakis, Marios
Published: (2025)

Deep recurrent-convolutional neural network learning and physics Kalman filtering comparison in dynamic load identification
by: Impraimakis, Marios
Published: (2025)

A convolutional neural network deep learning method for model class selection
by: Impraimakis, Marios
Published: (2025)

A generative adversarial network optimization method for damage detection and digital twinning by deep AI fault learning: Z24 Bridge structural health monitoring benchmark validation
by: Impraimakis, Marios, et al.
Published: (2025)

An Information-Theoretic Method for Dynamic System Identification With Output-Only Damping Estimation
by: Impraimakis, Marios, et al.
Published: (2026)

Beyond Speech and More: Investigating the Emergent Ability of Speech Foundation Models for Classifying Physiological Time-Series Signals
by: Phukan, Orchid Chetia, et al.
Published: (2024)

Rethinking Masking Strategies for Masked Prediction-based Audio Self-supervised Learning
by: Niizumi, Daisuke, et al.
Published: (2026)

Noise-Robust Keyword Spotting through Self-supervised Pretraining
by: Mørk, Jacob, et al.
Published: (2024)

Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions
by: Bovbjerg, Holger Severin, et al.
Published: (2023)

Learning Robust Spatial Representations from Binaural Audio through Feature Distillation
by: Bovbjerg, Holger Severin, et al.
Published: (2025)

Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining
by: Bovbjerg, Holger Severin, et al.
Published: (2025)

Audio-based Kinship Verification Using Age Domain Conversion
by: Sun, Qiyang, et al.
Published: (2024)

Passive Underwater Acoustic Signal Separation based on Feature Decoupling Dual-path Network
by: Liu, Yucheng, et al.
Published: (2025)

Exploring rhythm formant analysis for Indic language classification
by: Gogoi, Parismita, et al.
Published: (2024)

Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2024)

Iterative Feature Boosting for Explainable Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2024)

Joint Feature and Output Distillation for Low-complexity Acoustic Scene Classification
by: Li, Haowen, et al.
Published: (2025)

Beyond Deep Learning: Speech Segmentation and Phone Classification with Neural Assemblies
by: Adelson, Trevor, et al.
Published: (2026)

Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
by: Mehta, Shivam, et al.
Published: (2025)

Local Diagnostics of Continuous Normalizing Flow for Out-of-Distribution Detection
by: Cao, Xinwei, et al.
Published: (2026)

SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment
by: Mehta, Shivam, et al.
Published: (2025)

DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids
by: Tsangko, Iosif, et al.
Published: (2025)

ChordSync: Conformer-Based Alignment of Chord Annotations to Music Audio
by: Poltronieri, Andrea, et al.
Published: (2024)

Symbolic Audio Classification via Modal Decision Tree Learning
by: Marzano, Enrico, et al.
Published: (2025)

Prevailing Research Areas for Music AI in the Era of Foundation Models
by: Wei, Megan, et al.
Published: (2024)

KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods
by: Nzeyimana, Antoine
Published: (2023)

Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech
by: Mehta, Shivam, et al.
Published: (2024)

Energy-based features and bi-LSTM neural network for EEG-based music and voice classification
by: Ariza, Isaac, et al.
Published: (2024)

Computational modeling of early language learning from acoustic speech and audiovisual input without linguistic priors
by: Räsänen, Okko
Published: (2026)

STAR: Speech-to-Audio Generation via Representation Learning
by: Xie, Zeyu, et al.
Published: (2025)

M2D-CLAP: Exploring General-purpose Audio-Language Representations Beyond CLAP
by: Niizumi, Daisuke, et al.
Published: (2025)

FakeSound2: A Benchmark for Explainable and Generalizable Deepfake Sound Detection
by: Xie, Zeyu, et al.
Published: (2025)

PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
by: Xie, Zeyu, et al.
Published: (2024)

AudioTime: A Temporally-aligned Audio-text Benchmark Dataset
by: Xie, Zeyu, et al.
Published: (2024)

CAST-TTS: A Simple Cross-Attention Framework for Unified Timbre Control in TTS
by: Zheng, Zihao, et al.
Published: (2026)

PicoAudio2: Temporal Controllable Text-to-Audio Generation with Natural Language Description
by: Zheng, Zihao, et al.
Published: (2025)

FakeSound: Deepfake General Audio Detection
by: Xie, Zeyu, et al.
Published: (2024)

TuneGenie: Reasoning-based LLM agents for preferential music generation
by: Pandey, Amitesh, et al.
Published: (2025)

Investigating Prosodic Signatures via Speech Pre-Trained Models for Audio Deepfake Source Attribution
by: Phukan, Orchid Chetia, et al.
Published: (2024)

Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2024)