Saved in:
| Main Authors: | Yeste, Víctor, Rivas-Arévalo, Rodrigo |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.00914 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SigWavNet: Learning Multiresolution Signal Wavelet Network for Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2025)
by: Nfissi, Alaa, et al.
Published: (2025)
Beyond Deep Learning: Speech Segmentation and Phone Classification with Neural Assemblies
by: Adelson, Trevor, et al.
Published: (2026)
by: Adelson, Trevor, et al.
Published: (2026)
Computational modeling of early language learning from acoustic speech and audiovisual input without linguistic priors
by: Räsänen, Okko
Published: (2026)
by: Räsänen, Okko
Published: (2026)
Iterative Feature Boosting for Explainable Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2024)
by: Nfissi, Alaa, et al.
Published: (2024)
Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2024)
by: Nfissi, Alaa, et al.
Published: (2024)
Enhancing Speech Emotion Recognition Leveraging Aligning Timestamps of ASR Transcripts and Speaker Diarization
by: Wang, Hsuan-Yu, et al.
Published: (2025)
by: Wang, Hsuan-Yu, et al.
Published: (2025)
Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speech
by: Mehta, Shivam, et al.
Published: (2024)
by: Mehta, Shivam, et al.
Published: (2024)
Thaka at KSAA-2026 Task 2: Regularized Fine-Tuning for Arabic Speech Diacritization
by: Alamr, Meshal, et al.
Published: (2026)
by: Alamr, Meshal, et al.
Published: (2026)
Measuring the Accuracy of Automatic Speech Recognition Solutions
by: Kuhn, Korbinian, et al.
Published: (2024)
by: Kuhn, Korbinian, et al.
Published: (2024)
Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
by: Hori, Takaaki, et al.
Published: (2025)
by: Hori, Takaaki, et al.
Published: (2025)
Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children
by: Ahn, Taekyung, et al.
Published: (2024)
by: Ahn, Taekyung, et al.
Published: (2024)
SW-ASR: A Context-Aware Hybrid ASR Pipeline for Robust Single Word Speech Recognition
by: Sharma, Manali, et al.
Published: (2026)
by: Sharma, Manali, et al.
Published: (2026)
SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech
by: Cheng, Zhuangfei, et al.
Published: (2025)
by: Cheng, Zhuangfei, et al.
Published: (2025)
Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis
by: Kim, Minu, et al.
Published: (2025)
by: Kim, Minu, et al.
Published: (2025)
Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
by: Mehta, Shivam, et al.
Published: (2025)
by: Mehta, Shivam, et al.
Published: (2025)
SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment
by: Mehta, Shivam, et al.
Published: (2025)
by: Mehta, Shivam, et al.
Published: (2025)
Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
by: Zhang, Pengfei, et al.
Published: (2026)
by: Zhang, Pengfei, et al.
Published: (2026)
Matcha-TTS: A fast TTS architecture with conditional flow matching
by: Mehta, Shivam, et al.
Published: (2023)
by: Mehta, Shivam, et al.
Published: (2023)
Emotional Voice Messages (EMOVOME) database: emotion recognition in spontaneous voice messages
by: Zaragozá, Lucía Gómez, et al.
Published: (2024)
by: Zaragozá, Lucía Gómez, et al.
Published: (2024)
Fine-tuning Pre-trained Audio Models for COVID-19 Detection: A Technical Report
by: de Brito, Daniel Oliveira, et al.
Published: (2025)
by: de Brito, Daniel Oliveira, et al.
Published: (2025)
Communication Access Real-Time Translation Through Collaborative Correction of Automatic Speech Recognition
by: Kuhn, Korbinian, et al.
Published: (2025)
by: Kuhn, Korbinian, et al.
Published: (2025)
An accurate and revised version of optical character recognition-based speech synthesis using LabVIEW
by: Mehta, Prateek, et al.
Published: (2025)
by: Mehta, Prateek, et al.
Published: (2025)
Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? -- A computational investigation
by: Khorrami, Khazar, et al.
Published: (2021)
by: Khorrami, Khazar, et al.
Published: (2021)
MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector
by: Costa-jussà, Marta R., et al.
Published: (2024)
by: Costa-jussà, Marta R., et al.
Published: (2024)
Everyday Speech in the Indian Subcontinent
by: P, Utkarsh
Published: (2024)
by: P, Utkarsh
Published: (2024)
Empathy Omni: Enabling Empathetic Speech Response Generation through Large Language Models
by: Wang, Haoyu, et al.
Published: (2025)
by: Wang, Haoyu, et al.
Published: (2025)
Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training
by: Robertson, Sean, et al.
Published: (2023)
by: Robertson, Sean, et al.
Published: (2023)
Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications
by: Kuhn, Korbinian, et al.
Published: (2024)
by: Kuhn, Korbinian, et al.
Published: (2024)
A Robust Classification Method using Hybrid Word Embedding for Early Diagnosis of Alzheimer's Disease
by: Li, Yangyang
Published: (2025)
by: Li, Yangyang
Published: (2025)
SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion Recognition with Sequential Class-Finetuning
by: Jain, Sarthak, et al.
Published: (2024)
by: Jain, Sarthak, et al.
Published: (2024)
Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Prevailing Research Areas for Music AI in the Era of Foundation Models
by: Wei, Megan, et al.
Published: (2024)
by: Wei, Megan, et al.
Published: (2024)
Passive Underwater Acoustic Signal Separation based on Feature Decoupling Dual-path Network
by: Liu, Yucheng, et al.
Published: (2025)
by: Liu, Yucheng, et al.
Published: (2025)
Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device
by: Kozak, Nazar
Published: (2026)
by: Kozak, Nazar
Published: (2026)
Quantifying the effect of speech pathology on automatic and human speaker verification
by: Halpern, Bence Mark, et al.
Published: (2024)
by: Halpern, Bence Mark, et al.
Published: (2024)
Less Stress, More Privacy: Stress Detection on Anonymized Speech of Air Traffic Controllers
by: Viswanathan, Janaki, et al.
Published: (2025)
by: Viswanathan, Janaki, et al.
Published: (2025)
ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction
by: Kim, Minu, et al.
Published: (2025)
by: Kim, Minu, et al.
Published: (2025)
Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices
by: Lasbordes, Maxence, et al.
Published: (2025)
by: Lasbordes, Maxence, et al.
Published: (2025)
A Voice-based Triage for Type 2 Diabetes using a Conversational Virtual Assistant in the Home Environment
by: Summoogum, Kelvin, et al.
Published: (2024)
by: Summoogum, Kelvin, et al.
Published: (2024)
Joint Feature and Output Distillation for Low-complexity Acoustic Scene Classification
by: Li, Haowen, et al.
Published: (2025)
by: Li, Haowen, et al.
Published: (2025)
Similar Items
-
SigWavNet: Learning Multiresolution Signal Wavelet Network for Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2025) -
Beyond Deep Learning: Speech Segmentation and Phone Classification with Neural Assemblies
by: Adelson, Trevor, et al.
Published: (2026) -
Computational modeling of early language learning from acoustic speech and audiovisual input without linguistic priors
by: Räsänen, Okko
Published: (2026) -
Iterative Feature Boosting for Explainable Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2024) -
Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2024)