:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mayrhofer, Benedikt, Pernkopf, Franz, Aichinger, Philipp, Hagmüller, Martin
Format:	Preprint
Published:	2026
Subjects:	Sound Machine Learning
Online Access:	https://arxiv.org/abs/2601.03892
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A multimodal Bayesian Network for symptom-level depression and anxiety prediction from voice and speech data
by: Norbury, Agnes, et al.
Published: (2025)

CognoSpeak: an automatic, remote assessment of early cognitive decline in real-world conversational speech
by: Pahar, Madhurananda, et al.
Published: (2025)

Acoustic and perceptual differences between standard and accented speech and their voice clones
by: Yang, Tianle, et al.
Published: (2026)

Online speaker diarization of meetings guided by speech separation
by: Gruttadauria, Elio, et al.
Published: (2024)

voice2mode: Phonation Mode Classification in Singing using Self-Supervised Speech Models
by: Justus, Aju Ani, et al.
Published: (2026)

Resource-constrained stereo singing voice cancellation
by: Borrelli, Clara, et al.
Published: (2024)

Throat and acoustic paired speech dataset for deep learning-based speech enhancement
by: Kim, Yunsik, et al.
Published: (2025)

IsoNet: Spatially-aware audio-visual target speech extraction in complex acoustic environments
by: Padhya, Dinanath, et al.
Published: (2026)

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
by: Maiti, Soumi, et al.
Published: (2023)

Selfsupervised learning for pathological speech detection
by: Sheikh, Shakeel Ahmad
Published: (2024)

Towards the Synthesis of Non-speech Vocalizations
by: Hoq, Enjamamul, et al.
Published: (2024)

Voxceleb-ESP: preliminary experiments detecting Spanish celebrities from their voices
by: Labrador, Beltrán, et al.
Published: (2023)

Introduction to speech recognition
by: Dauphin, Gabriel
Published: (2024)

EEG-to-Voice Decoding of Spoken and Imagined speech Using Non-Invasive EEG
by: Park, Hanbeot, et al.
Published: (2025)

Single-channel speech enhancement using learnable loss mixup
by: Chang, Oscar, et al.
Published: (2023)

Zipformer: A faster and better encoder for automatic speech recognition
by: Yao, Zengwei, et al.
Published: (2023)

CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024)

Robustifying automatic speech recognition by extracting slowly varying features
by: Pizarro, Matías, et al.
Published: (2021)

Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
by: Nasr, Seham, et al.
Published: (2025)

FlowDec: A flow-based full-band general audio codec with high perceptual quality
by: Welker, Simon, et al.
Published: (2025)

SpectroFusion-ViT: A Lightweight Transformer for Speech Emotion Recognition Using Harmonic Mel-Chroma Fusion
by: Ahmed, Faria, et al.
Published: (2026)

Generalizable speech deepfake detection via meta-learned LoRA
by: Laakkonen, Janne, et al.
Published: (2025)

Late fusion ensembles for speech recognition on diverse input audio representations
by: Jezidžić, Marin, et al.
Published: (2024)

Boosting keyword spotting through on-device learnable user speech characteristics
by: Cioflan, Cristian, et al.
Published: (2024)

Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
by: Fujita, Kenichi, et al.
Published: (2024)

Context-aware child-directed speech detection from long-form recordings
by: Charlot, Théo, et al.
Published: (2026)

Dementia classification from spontaneous speech using wrapper-based feature selection
by: Niemelä, Marko, et al.
Published: (2025)

Acoustic characterization of speech rhythm: going beyond metrics with recurrent neural networks
by: Deloche, François, et al.
Published: (2024)

Screening method for early dementia using sound objects as voice biomarkers
by: Pluta, Adam, et al.
Published: (2024)

Fusion approaches for emotion recognition from speech using acoustic and text-based features
by: Pepino, Leonardo, et al.
Published: (2024)

An Attention Long Short-Term Memory based system for automatic classification of speech intelligibility
by: Fernández-Díaz, Miguel, et al.
Published: (2024)

SeMaScore : a new evaluation metric for automatic speech recognition tasks
by: Sasindran, Zitha, et al.
Published: (2024)

Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?
by: Maharana, Sarthak Kumar, et al.
Published: (2023)

Analyzing the Importance of Blank for CTC-Based Knowledge Distillation
by: Hilmes, Benedikt, et al.
Published: (2025)

Adaptive Variational Inference in Probabilistic Graphical Models: Beyond Bethe, Tree-Reweighted, and Convex Free Energies
by: Leisenberger, Harald, et al.
Published: (2025)

A multimodal dynamical variational autoencoder for audiovisual speech representation learning
by: Sadok, Samir, et al.
Published: (2023)

A vector quantized masked autoencoder for audiovisual speech emotion recognition
by: Sadok, Samir, et al.
Published: (2023)

Towards objective and interpretable speech disorder assessment: a comparative analysis of CNN and transformer-based models
by: Maisonneuve, Malo, et al.
Published: (2024)

Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge
by: Leglaive, Simon, et al.
Published: (2024)

Self-supervised learning of speech representations with Dutch archival data
by: Vaessen, Nik, et al.
Published: (2025)