:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Yang, Haoyuan, Zhang, Yue, Jing, Liqiang, Hansen, John H. L.
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Sound Artificial Intelligence Audio and Speech Processing
Online-Zugang:	https://arxiv.org/abs/2506.07323
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge
von: Wang, He, et al.
Veröffentlicht: (2024)

Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition
von: Hu, Shujie, et al.
Veröffentlicht: (2024)

Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches
von: Aboeitta, Ahmed, et al.
Veröffentlicht: (2025)

CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
von: Shao, Nian, et al.
Veröffentlicht: (2025)

Investigation of Whisper ASR Hallucinations Induced by Non-Speech Audio
von: Barański, Mateusz, et al.
Veröffentlicht: (2025)

Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM
von: Zhang, Fengrun, et al.
Veröffentlicht: (2024)

LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
von: Kamahori, Keisuke, et al.
Veröffentlicht: (2025)

Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER
von: Zheng, Xiuwen, et al.
Veröffentlicht: (2026)

FlanEC: Exploring Flan-T5 for Post-ASR Error Correction
von: La Quatra, Moreno, et al.
Veröffentlicht: (2025)

Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples
von: Wang, Zhenyu, et al.
Veröffentlicht: (2024)

Articulation-Informed ASR: Integrating Articulatory Features into ASR via Auxiliary Speech Inversion and Cross-Attention Fusion
von: Attia, Ahmed Adel, et al.
Veröffentlicht: (2025)

Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition
von: Kim, Jaeyoung, et al.
Veröffentlicht: (2024)

Learning Physiology-Informed Vocal Spectrotemporal Representations for Speech Emotion Recognition
von: Zhang, Xu, et al.
Veröffentlicht: (2026)

Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models
von: Shakhadri, Syed Abdul Gaffar, et al.
Veröffentlicht: (2025)

TinyML for Speech Recognition
von: Barovic, Andrew, et al.
Veröffentlicht: (2025)

GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems
von: Robatian, Amin, et al.
Veröffentlicht: (2025)

TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition
von: Yang, Cheng-Yeh, et al.
Veröffentlicht: (2026)

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation
von: Srivastav, Vaibhav, et al.
Veröffentlicht: (2025)

Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
von: Liu, Rui, et al.
Veröffentlicht: (2025)

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
von: Du, Zhihao, et al.
Veröffentlicht: (2025)

VIBEVOICE-ASR Technical Report
von: Peng, Zhiliang, et al.
Veröffentlicht: (2026)

Data-Efficient ASR Personalization for Non-Normative Speech Using an Uncertainty-Based Phoneme Difficulty Score for Guided Sampling
von: Pokel, Niclas, et al.
Veröffentlicht: (2025)

Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users using Intermediate ASR Features and Human Memory Models
von: Mogridge, Rhiannon, et al.
Veröffentlicht: (2024)

Linear Time Complexity Conformers with SummaryMixing for Streaming Speech Recognition
von: Parcollet, Titouan, et al.
Veröffentlicht: (2024)

Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition
von: Li, Dongyuan, et al.
Veröffentlicht: (2024)

Improvement and Implementation of a Speech Emotion Recognition Model Based on Dual-Layer LSTM
von: Yang, Xiaoran, et al.
Veröffentlicht: (2024)

Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
von: Choi, Yerin, et al.
Veröffentlicht: (2024)

VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining
von: Zhuo, Jianheng, et al.
Veröffentlicht: (2025)

Speech Emotion Recognition with ASR Integration
von: Li, Yuanchao
Veröffentlicht: (2026)

Color-based Emotion Representation for Speech Emotion Recognition
von: Nagase, Ryotaro, et al.
Veröffentlicht: (2026)

Unifying Speech Recognition, Synthesis and Conversion with Autoregressive Transformers
von: Cai, Runyuan, et al.
Veröffentlicht: (2026)

Persian Speech Emotion Recognition by Fine-Tuning Transformers
von: Shayaninasab, Minoo, et al.
Veröffentlicht: (2024)

Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
von: Shi, Hao, et al.
Veröffentlicht: (2024)

Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing
von: Wang, Mengqi, et al.
Veröffentlicht: (2025)

Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition
von: Yao, Wenhan, et al.
Veröffentlicht: (2024)

Automatic Speech Recognition in the Modern Era: Architectures, Training, and Evaluation
von: Nayeem, Md., et al.
Veröffentlicht: (2025)

Efficient Finetuning for Dimensional Speech Emotion Recognition in the Age of Transformers
von: Sampath, Aneesha, et al.
Veröffentlicht: (2025)

Multi-Channel Differential ASR for Robust Wearer Speech Recognition on Smart Glasses
von: Yang, Yufeng, et al.
Veröffentlicht: (2025)

MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech
von: Mai, Jialong, et al.
Veröffentlicht: (2025)

Deploying UDM Series in Real-Life Stuttered Speech Applications: A Clinical Evaluation Framework
von: Zhang, Eric, et al.
Veröffentlicht: (2025)