:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Huang, Guanjie, Tsang, Danny Hin Kwok, Liu, Li
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Audio and Speech Processing Sound
Accesso online:	https://arxiv.org/abs/2503.21785
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Speech Recognition for Analysis of Police Radio Communication
di: Srivastava, Tejes, et al.
Pubblicazione: (2024)

Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Immersive Audiobook Generation
di: Rong, Yan, et al.
Pubblicazione: (2025)

Semantic Communications for Speech Recognition
di: Weng, Zhenzi, et al.
Pubblicazione: (2021)

Rehearsal-Free Online Continual Learning for Automatic Speech Recognition
di: Eeckt, Steven Vander, et al.
Pubblicazione: (2023)

Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis
di: Leung, Wing-Zin, et al.
Pubblicazione: (2024)

Reducing the Gap Between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging Module
di: Cui, Zhongjian, et al.
Pubblicazione: (2025)

MLLM-based Speech Recognition: When and How is Multimodality Beneficial?
di: Guan, Yiwen, et al.
Pubblicazione: (2025)

From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition
di: Huang, Mengcheng, et al.
Pubblicazione: (2026)

Dataset-Distillation Generative Model for Speech Emotion Recognition
di: Ritter-Gutierrez, Fabian, et al.
Pubblicazione: (2024)

Noisy Disentanglement with Tri-stage Training for Noise-Robust Speech Recognition
di: Chen, Shuangyuan, et al.
Pubblicazione: (2025)

Loudspeaker Beamforming to Enhance Speech Recognition Performance of Voice Driven Applications
di: de Groot, Dimme, et al.
Pubblicazione: (2025)

Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation
di: Chang, Yi, et al.
Pubblicazione: (2024)

EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
di: Yang, Yiqing, et al.
Pubblicazione: (2025)

Semantic-Emotional Resonance Embedding: A Semi-Supervised Paradigm for Cross-Lingual Speech Emotion Recognition
di: Zhao, Ya, et al.
Pubblicazione: (2026)

Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
di: Lin, Guan-Ting, et al.
Pubblicazione: (2024)

A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition
di: Wang, Shiyao, et al.
Pubblicazione: (2025)

Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
di: Alsayegh, Ali, et al.
Pubblicazione: (2025)

The TEA-ASLP System for Multilingual Conversational Speech Recognition and Speech Diarization in MLC-SLM 2025 Challenge
di: Xue, Hongfei, et al.
Pubblicazione: (2025)

PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition
di: Phukan, Orchid Chetia, et al.
Pubblicazione: (2025)

SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding
di: Wei, Linye, et al.
Pubblicazione: (2025)

In-Materia Speech Recognition
di: Zolfagharinejad, Mohamadreza, et al.
Pubblicazione: (2024)

Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition
di: Huang, Guanjie, et al.
Pubblicazione: (2025)

Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
di: Masuyama, Yoshiki, et al.
Pubblicazione: (2024)

Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models
di: Gao, Xiaoxue, et al.
Pubblicazione: (2024)

DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module
di: Wang, Xinyu, et al.
Pubblicazione: (2024)

PCQ: Emotion Recognition in Speech via Progressive Channel Querying
di: Wang, Xincheng, et al.
Pubblicazione: (2024)

Sequential Editing for Lifelong Training of Speech Recognition Models
di: Kulshreshtha, Devang, et al.
Pubblicazione: (2024)

Emotion-Aware Contrastive Adaptation Network for Source-Free Cross-Corpus Speech Emotion Recognition
di: Zhao, Yan, et al.
Pubblicazione: (2024)

Unified Architecture and Unsupervised Speech Disentanglement for Speaker Embedding-Free Enrollment in Personalized Speech Enhancement
di: Huang, Ziling, et al.
Pubblicazione: (2025)

Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models
di: Jing, Ruihao, et al.
Pubblicazione: (2025)

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
di: Bai, Ye, et al.
Pubblicazione: (2024)

USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
di: Ding, Shaojin, et al.
Pubblicazione: (2023)

Machine Unlearning in Speech Emotion Recognition via Forget Set Alone
di: Ren, Zhao, et al.
Pubblicazione: (2025)

Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding
di: Zhou, Haoran, et al.
Pubblicazione: (2025)

Testing Correctness, Fairness, and Robustness of Speech Emotion Recognition Models
di: Derington, Anna, et al.
Pubblicazione: (2023)

Leveraging Self-Supervised Models for Automatic Whispered Speech Recognition
di: Farhadipour, Aref, et al.
Pubblicazione: (2024)

EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
di: Wu, Haibin, et al.
Pubblicazione: (2024)

Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement
di: Nasretdinov, Rauf, et al.
Pubblicazione: (2025)

AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
di: Lin, Ju, et al.
Pubblicazione: (2024)

Erasing Your Voice Before It's Heard: Training-free Speaker Unlearning for Zero-shot Text-to-Speech
di: Lee, Myungjin, et al.
Pubblicazione: (2026)