:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yeste, Víctor, Rivas-Arévalo, Rodrigo
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence Computers and Society Sound Audio and Speech Processing I.2.7; I.2.6; J.4
Online Access:	https://arxiv.org/abs/2602.00914
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SigWavNet: Learning Multiresolution Signal Wavelet Network for Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2025)

Beyond Deep Learning: Speech Segmentation and Phone Classification with Neural Assemblies
by: Adelson, Trevor, et al.
Published: (2026)

Computational modeling of early language learning from acoustic speech and audiovisual input without linguistic priors
by: Räsänen, Okko
Published: (2026)

Iterative Feature Boosting for Explainable Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2024)

Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2024)

Enhancing Speech Emotion Recognition Leveraging Aligning Timestamps of ASR Transcripts and Speaker Diarization
by: Wang, Hsuan-Yu, et al.
Published: (2025)

Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech
by: Mehta, Shivam, et al.
Published: (2024)

Thaka at KSAA-2026 Task 2: Regularized Fine-Tuning for Arabic Speech Diacritization
by: Alamr, Meshal, et al.
Published: (2026)

Measuring the Accuracy of Automatic Speech Recognition Solutions
by: Kuhn, Korbinian, et al.
Published: (2024)

Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
by: Hori, Takaaki, et al.
Published: (2025)

Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children
by: Ahn, Taekyung, et al.
Published: (2024)

SW-ASR: A Context-Aware Hybrid ASR Pipeline for Robust Single Word Speech Recognition
by: Sharma, Manali, et al.
Published: (2026)

SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech
by: Cheng, Zhuangfei, et al.
Published: (2025)

Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis
by: Kim, Minu, et al.
Published: (2025)

Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
by: Mehta, Shivam, et al.
Published: (2025)

SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment
by: Mehta, Shivam, et al.
Published: (2025)

Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
by: Zhang, Pengfei, et al.
Published: (2026)

Matcha-TTS: A fast TTS architecture with conditional flow matching
by: Mehta, Shivam, et al.
Published: (2023)

Emotional Voice Messages (EMOVOME) database: emotion recognition in spontaneous voice messages
by: Zaragozá, Lucía Gómez, et al.
Published: (2024)

Fine-tuning Pre-trained Audio Models for COVID-19 Detection: A Technical Report
by: de Brito, Daniel Oliveira, et al.
Published: (2025)

Communication Access Real-Time Translation Through Collaborative Correction of Automatic Speech Recognition
by: Kuhn, Korbinian, et al.
Published: (2025)

An accurate and revised version of optical character recognition-based speech synthesis using LabVIEW
by: Mehta, Prateek, et al.
Published: (2025)

Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? -- A computational investigation
by: Khorrami, Khazar, et al.
Published: (2021)

MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector
by: Costa-jussà, Marta R., et al.
Published: (2024)

Everyday Speech in the Indian Subcontinent
by: P, Utkarsh
Published: (2024)

Empathy Omni: Enabling Empathetic Speech Response Generation through Large Language Models
by: Wang, Haoyu, et al.
Published: (2025)

Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training
by: Robertson, Sean, et al.
Published: (2023)

Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications
by: Kuhn, Korbinian, et al.
Published: (2024)

A Robust Classification Method using Hybrid Word Embedding for Early Diagnosis of Alzheimer's Disease
by: Li, Yangyang
Published: (2025)

SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion Recognition with Sequential Class-Finetuning
by: Jain, Sarthak, et al.
Published: (2024)

Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2024)

Prevailing Research Areas for Music AI in the Era of Foundation Models
by: Wei, Megan, et al.
Published: (2024)

Passive Underwater Acoustic Signal Separation based on Feature Decoupling Dual-path Network
by: Liu, Yucheng, et al.
Published: (2025)

Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device
by: Kozak, Nazar
Published: (2026)

Quantifying the effect of speech pathology on automatic and human speaker verification
by: Halpern, Bence Mark, et al.
Published: (2024)

Less Stress, More Privacy: Stress Detection on Anonymized Speech of Air Traffic Controllers
by: Viswanathan, Janaki, et al.
Published: (2025)

ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction
by: Kim, Minu, et al.
Published: (2025)

Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices
by: Lasbordes, Maxence, et al.
Published: (2025)

A Voice-based Triage for Type 2 Diabetes using a Conversational Virtual Assistant in the Home Environment
by: Summoogum, Kelvin, et al.
Published: (2024)

Joint Feature and Output Distillation for Low-complexity Acoustic Scene Classification
by: Li, Haowen, et al.
Published: (2025)