Saved in:
| Main Authors: | Mayrhofer, Benedikt, Pernkopf, Franz, Aichinger, Philipp, Hagmüller, Martin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.03892 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A multimodal Bayesian Network for symptom-level depression and anxiety prediction from voice and speech data
by: Norbury, Agnes, et al.
Published: (2025)
by: Norbury, Agnes, et al.
Published: (2025)
CognoSpeak: an automatic, remote assessment of early cognitive decline in real-world conversational speech
by: Pahar, Madhurananda, et al.
Published: (2025)
by: Pahar, Madhurananda, et al.
Published: (2025)
Acoustic and perceptual differences between standard and accented speech and their voice clones
by: Yang, Tianle, et al.
Published: (2026)
by: Yang, Tianle, et al.
Published: (2026)
Online speaker diarization of meetings guided by speech separation
by: Gruttadauria, Elio, et al.
Published: (2024)
by: Gruttadauria, Elio, et al.
Published: (2024)
voice2mode: Phonation Mode Classification in Singing using Self-Supervised Speech Models
by: Justus, Aju Ani, et al.
Published: (2026)
by: Justus, Aju Ani, et al.
Published: (2026)
Resource-constrained stereo singing voice cancellation
by: Borrelli, Clara, et al.
Published: (2024)
by: Borrelli, Clara, et al.
Published: (2024)
Throat and acoustic paired speech dataset for deep learning-based speech enhancement
by: Kim, Yunsik, et al.
Published: (2025)
by: Kim, Yunsik, et al.
Published: (2025)
IsoNet: Spatially-aware audio-visual target speech extraction in complex acoustic environments
by: Padhya, Dinanath, et al.
Published: (2026)
by: Padhya, Dinanath, et al.
Published: (2026)
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
by: Maiti, Soumi, et al.
Published: (2023)
by: Maiti, Soumi, et al.
Published: (2023)
Selfsupervised learning for pathological speech detection
by: Sheikh, Shakeel Ahmad
Published: (2024)
by: Sheikh, Shakeel Ahmad
Published: (2024)
Towards the Synthesis of Non-speech Vocalizations
by: Hoq, Enjamamul, et al.
Published: (2024)
by: Hoq, Enjamamul, et al.
Published: (2024)
Voxceleb-ESP: preliminary experiments detecting Spanish celebrities from their voices
by: Labrador, Beltrán, et al.
Published: (2023)
by: Labrador, Beltrán, et al.
Published: (2023)
Introduction to speech recognition
by: Dauphin, Gabriel
Published: (2024)
by: Dauphin, Gabriel
Published: (2024)
EEG-to-Voice Decoding of Spoken and Imagined speech Using Non-Invasive EEG
by: Park, Hanbeot, et al.
Published: (2025)
by: Park, Hanbeot, et al.
Published: (2025)
Single-channel speech enhancement using learnable loss mixup
by: Chang, Oscar, et al.
Published: (2023)
by: Chang, Oscar, et al.
Published: (2023)
Zipformer: A faster and better encoder for automatic speech recognition
by: Yao, Zengwei, et al.
Published: (2023)
by: Yao, Zengwei, et al.
Published: (2023)
CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024)
by: Yao, Zengwei, et al.
Published: (2024)
Robustifying automatic speech recognition by extracting slowly varying features
by: Pizarro, Matías, et al.
Published: (2021)
by: Pizarro, Matías, et al.
Published: (2021)
Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
by: Nasr, Seham, et al.
Published: (2025)
by: Nasr, Seham, et al.
Published: (2025)
FlowDec: A flow-based full-band general audio codec with high perceptual quality
by: Welker, Simon, et al.
Published: (2025)
by: Welker, Simon, et al.
Published: (2025)
SpectroFusion-ViT: A Lightweight Transformer for Speech Emotion Recognition Using Harmonic Mel-Chroma Fusion
by: Ahmed, Faria, et al.
Published: (2026)
by: Ahmed, Faria, et al.
Published: (2026)
Generalizable speech deepfake detection via meta-learned LoRA
by: Laakkonen, Janne, et al.
Published: (2025)
by: Laakkonen, Janne, et al.
Published: (2025)
Late fusion ensembles for speech recognition on diverse input audio representations
by: Jezidžić, Marin, et al.
Published: (2024)
by: Jezidžić, Marin, et al.
Published: (2024)
Boosting keyword spotting through on-device learnable user speech characteristics
by: Cioflan, Cristian, et al.
Published: (2024)
by: Cioflan, Cristian, et al.
Published: (2024)
Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters
by: Fujita, Kenichi, et al.
Published: (2024)
by: Fujita, Kenichi, et al.
Published: (2024)
Context-aware child-directed speech detection from long-form recordings
by: Charlot, Théo, et al.
Published: (2026)
by: Charlot, Théo, et al.
Published: (2026)
Dementia classification from spontaneous speech using wrapper-based feature selection
by: Niemelä, Marko, et al.
Published: (2025)
by: Niemelä, Marko, et al.
Published: (2025)
Acoustic characterization of speech rhythm: going beyond metrics with recurrent neural networks
by: Deloche, François, et al.
Published: (2024)
by: Deloche, François, et al.
Published: (2024)
Screening method for early dementia using sound objects as voice biomarkers
by: Pluta, Adam, et al.
Published: (2024)
by: Pluta, Adam, et al.
Published: (2024)
Fusion approaches for emotion recognition from speech using acoustic and text-based features
by: Pepino, Leonardo, et al.
Published: (2024)
by: Pepino, Leonardo, et al.
Published: (2024)
An Attention Long Short-Term Memory based system for automatic classification of speech intelligibility
by: Fernández-Díaz, Miguel, et al.
Published: (2024)
by: Fernández-Díaz, Miguel, et al.
Published: (2024)
SeMaScore : a new evaluation metric for automatic speech recognition tasks
by: Sasindran, Zitha, et al.
Published: (2024)
by: Sasindran, Zitha, et al.
Published: (2024)
Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?
by: Maharana, Sarthak Kumar, et al.
Published: (2023)
by: Maharana, Sarthak Kumar, et al.
Published: (2023)
Analyzing the Importance of Blank for CTC-Based Knowledge Distillation
by: Hilmes, Benedikt, et al.
Published: (2025)
by: Hilmes, Benedikt, et al.
Published: (2025)
Adaptive Variational Inference in Probabilistic Graphical Models: Beyond Bethe, Tree-Reweighted, and Convex Free Energies
by: Leisenberger, Harald, et al.
Published: (2025)
by: Leisenberger, Harald, et al.
Published: (2025)
A multimodal dynamical variational autoencoder for audiovisual speech representation learning
by: Sadok, Samir, et al.
Published: (2023)
by: Sadok, Samir, et al.
Published: (2023)
A vector quantized masked autoencoder for audiovisual speech emotion recognition
by: Sadok, Samir, et al.
Published: (2023)
by: Sadok, Samir, et al.
Published: (2023)
Towards objective and interpretable speech disorder assessment: a comparative analysis of CNN and transformer-based models
by: Maisonneuve, Malo, et al.
Published: (2024)
by: Maisonneuve, Malo, et al.
Published: (2024)
Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge
by: Leglaive, Simon, et al.
Published: (2024)
by: Leglaive, Simon, et al.
Published: (2024)
Self-supervised learning of speech representations with Dutch archival data
by: Vaessen, Nik, et al.
Published: (2025)
by: Vaessen, Nik, et al.
Published: (2025)
Similar Items
-
A multimodal Bayesian Network for symptom-level depression and anxiety prediction from voice and speech data
by: Norbury, Agnes, et al.
Published: (2025) -
CognoSpeak: an automatic, remote assessment of early cognitive decline in real-world conversational speech
by: Pahar, Madhurananda, et al.
Published: (2025) -
Acoustic and perceptual differences between standard and accented speech and their voice clones
by: Yang, Tianle, et al.
Published: (2026) -
Online speaker diarization of meetings guided by speech separation
by: Gruttadauria, Elio, et al.
Published: (2024) -
voice2mode: Phonation Mode Classification in Singing using Self-Supervised Speech Models
by: Justus, Aju Ani, et al.
Published: (2026)