Saved in:
| Main Authors: | Liu, Zefang, Zhu, Chenyang, Cho, Sangwoo, Zhang, Shi-Xiong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.18721 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition
by: Sun, Haoqin, et al.
Published: (2024)
by: Sun, Haoqin, et al.
Published: (2024)
Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization
by: Guo, Yuxin, et al.
Published: (2024)
by: Guo, Yuxin, et al.
Published: (2024)
A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition
by: Li, Yangze, et al.
Published: (2024)
by: Li, Yangze, et al.
Published: (2024)
Large Language Model Guided Decoding for Self-Supervised Speech Recognition
by: Cohen, Eyal, et al.
Published: (2025)
by: Cohen, Eyal, et al.
Published: (2025)
Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models
by: Zezario, Ryandhimas E., et al.
Published: (2026)
by: Zezario, Ryandhimas E., et al.
Published: (2026)
The TMU System for the XACLE Challenge: Training Large Audio Language Models with CLAP Pseudo-Labels
by: Tsutsumi, Ayuto, et al.
Published: (2026)
by: Tsutsumi, Ayuto, et al.
Published: (2026)
Audio-Visual Feature Synchronization for Robust Speech Enhancement in Hearing Aids
by: Saleem, Nasir, et al.
Published: (2025)
by: Saleem, Nasir, et al.
Published: (2025)
Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling
by: Li, Yuanchao, et al.
Published: (2024)
by: Li, Yuanchao, et al.
Published: (2024)
Unified Semi-Supervised Pipeline for Automatic Speech Recognition
by: Tadevosyan, Nune, et al.
Published: (2025)
by: Tadevosyan, Nune, et al.
Published: (2025)
A Semi-spontaneous Dutch Speech Dataset for Speech Enhancement and Speech Recognition
by: de Groot, Dimme, et al.
Published: (2026)
by: de Groot, Dimme, et al.
Published: (2026)
Interpreting the Role of Visemes in Audio-Visual Speech Recognition
by: Papadopoulos, Aristeidis, et al.
Published: (2025)
by: Papadopoulos, Aristeidis, et al.
Published: (2025)
Audio-Visual Speech Separation via Bottleneck Iterative Network
by: Zhang, Sidong, et al.
Published: (2025)
by: Zhang, Sidong, et al.
Published: (2025)
SuPseudo: A Pseudo-supervised Learning Method for Neural Speech Enhancement in Far-field Speech Recognition
by: Luo, Longjie, et al.
Published: (2025)
by: Luo, Longjie, et al.
Published: (2025)
BANC: Towards Efficient Binaural Audio Neural Codec for Overlapping Speech
by: Ratnarajah, Anton, et al.
Published: (2023)
by: Ratnarajah, Anton, et al.
Published: (2023)
Non-Intrusive Automatic Speech Recognition Refinement: A Survey
by: Peyghan, Mohammad Reza, et al.
Published: (2025)
by: Peyghan, Mohammad Reza, et al.
Published: (2025)
Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization
by: Wan, Genshun, et al.
Published: (2026)
by: Wan, Genshun, et al.
Published: (2026)
Semantic-Emotional Resonance Embedding: A Semi-Supervised Paradigm for Cross-Lingual Speech Emotion Recognition
by: Zhao, Ya, et al.
Published: (2026)
by: Zhao, Ya, et al.
Published: (2026)
Diarization-Aware Multi-Speaker Automatic Speech Recognition via Large Language Models
by: Lin, Yuke, et al.
Published: (2025)
by: Lin, Yuke, et al.
Published: (2025)
Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach
by: Lin, Yi-Cheng, et al.
Published: (2025)
by: Lin, Yi-Cheng, et al.
Published: (2025)
Pseudo Strong Labels from Frame-Level Predictions for Weakly Supervised Sound Event Detection
by: Zhang, Yuliang, et al.
Published: (2025)
by: Zhang, Yuliang, et al.
Published: (2025)
Adapting Speech Foundation Models for Unified Multimodal Speech Recognition with Large Language Models
by: Zhang, Jing-Xuan, et al.
Published: (2025)
by: Zhang, Jing-Xuan, et al.
Published: (2025)
FairASR: Fair Audio Contrastive Learning for Automatic Speech Recognition
by: Kim, Jongsuk, et al.
Published: (2025)
by: Kim, Jongsuk, et al.
Published: (2025)
Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders
by: Shan, Weiqiao, et al.
Published: (2025)
by: Shan, Weiqiao, et al.
Published: (2025)
Uncovering the Visual Contribution in Audio-Visual Speech Recognition
by: Lin, Zhaofeng, et al.
Published: (2024)
by: Lin, Zhaofeng, et al.
Published: (2024)
Attention-weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied to Speech Emotion Recognition
by: Yang, Qingran, et al.
Published: (2026)
by: Yang, Qingran, et al.
Published: (2026)
LCB-net: Long-Context Biasing for Audio-Visual Speech Recognition
by: Yu, Fan, et al.
Published: (2024)
by: Yu, Fan, et al.
Published: (2024)
Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio
by: Shi, Mohan, et al.
Published: (2025)
by: Shi, Mohan, et al.
Published: (2025)
EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
by: Yang, Yiqing, et al.
Published: (2025)
by: Yang, Yiqing, et al.
Published: (2025)
Cross-Modal Bottleneck Fusion For Noise Robust Audio-Visual Speech Recognition
by: Ok, Seaone, et al.
Published: (2026)
by: Ok, Seaone, et al.
Published: (2026)
Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement
by: Su, Fei, et al.
Published: (2026)
by: Su, Fei, et al.
Published: (2026)
Empowering Low-Resource Language ASR via Large-Scale Pseudo Labeling
by: Bhogale, Kaushal Santosh, et al.
Published: (2024)
by: Bhogale, Kaushal Santosh, et al.
Published: (2024)
Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge
by: Luo, Longjie, et al.
Published: (2025)
by: Luo, Longjie, et al.
Published: (2025)
Generalizable Audio Deepfake Detection via Latent Space Refinement and Augmentation
by: Huang, Wen, et al.
Published: (2025)
by: Huang, Wen, et al.
Published: (2025)
Refining Self-Supervised Learnt Speech Representation using Brain Activations
by: Li, Hengyu, et al.
Published: (2024)
by: Li, Hengyu, et al.
Published: (2024)
Aligning Speech to Languages to Enhance Code-switching Speech Recognition
by: Liu, Hexin, et al.
Published: (2024)
by: Liu, Hexin, et al.
Published: (2024)
LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models
by: Zhao, Xiaohan, et al.
Published: (2025)
by: Zhao, Xiaohan, et al.
Published: (2025)
Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
by: Alsayegh, Ali, et al.
Published: (2025)
by: Alsayegh, Ali, et al.
Published: (2025)
Identifying Hearing Difficulty Moments in Conversational Audio
by: Collins, Jack, et al.
Published: (2025)
by: Collins, Jack, et al.
Published: (2025)
Few-shot Personalization via In-Context Learning for Speech Emotion Recognition based on Speech-Language Model
by: Ihori, Mana, et al.
Published: (2025)
by: Ihori, Mana, et al.
Published: (2025)
Similar Items
-
Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition
by: Sun, Haoqin, et al.
Published: (2024) -
Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization
by: Guo, Yuxin, et al.
Published: (2024) -
A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition
by: Li, Yangze, et al.
Published: (2024) -
Large Language Model Guided Decoding for Self-Supervised Speech Recognition
by: Cohen, Eyal, et al.
Published: (2025) -
Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition
by: Lin, Yi-Cheng, et al.
Published: (2025)