:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Luo, Longjie, Li, Lin, Hong, Qingyang
Formato:	Preprint
Publicado:	2025
Materias:	Sound Audio and Speech Processing
Acceso en línea:	https://arxiv.org/abs/2505.24450
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge
por: Luo, Longjie, et al.
Publicado: (2025)

ctPuLSE: Close-Talk, and Pseudo-Label Based Far-Field, Speech Enhancement
por: Wang, Zhong-Qiu
Publicado: (2024)

Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation
por: Lin, Zhaofeng, et al.
Publicado: (2023)

Mitigating Subgroup Disparities in Multi-Label Speech Emotion Recognition: A Pseudo-Labeling and Unsupervised Learning Approach
por: Lin, Yi-Cheng, et al.
Publicado: (2025)

Robust Speech Recognition with Schrödinger Bridge-Based Speech Enhancement
por: Nasretdinov, Rauf, et al.
Publicado: (2025)

A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement
por: Lu, Shenghui, et al.
Publicado: (2025)

Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network
por: Chen, Yanan, et al.
Publicado: (2024)

Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model
por: Huang, Hukai, et al.
Publicado: (2024)

Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models
por: Zezario, Ryandhimas E., et al.
Publicado: (2026)

Adaptive Data Augmentation with NaturalSpeech3 for Far-field Speaker Verification
por: Zhang, Li, et al.
Publicado: (2025)

Causal Speech Enhancement with Predicting Semantics based on Quantized Self-supervised Learning Features
por: Tsunoo, Emiru, et al.
Publicado: (2024)

Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance
por: Ochiai, Tsubasa, et al.
Publicado: (2024)

ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech
por: Guan, Wenhao, et al.
Publicado: (2023)

Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition
por: Wang, Kuan-Chen, et al.
Publicado: (2024)

Advancing Electrolaryngeal Speech Enhancement Through Speech-Text Representation Learning
por: Ma, Ding, et al.
Publicado: (2026)

GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning for Speech Emotion Recognition
por: Pan, Yu, et al.
Publicado: (2024)

DS-Codec: Dual-Stage Training with Mirror-to-NonMirror Architecture Switching for Speech Codec
por: Chen, Peijie, et al.
Publicado: (2025)

MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
por: Guan, Wenhao, et al.
Publicado: (2023)

Phoenix-VAD: Streaming Semantic Endpoint Detection for Full-Duplex Speech Interaction
por: Wu, Weijie, et al.
Publicado: (2025)

WhispEar: A Bi-directional Framework for Scaling Whispered Speech Conversion via Pseudo-Parallel Whisper Generation
por: Fang, Zihao, et al.
Publicado: (2026)

Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model
por: Zezario, Ryandhimas E., et al.
Publicado: (2023)

Latent-Level Enhancement with Flow Matching for Robust Automatic Speech Recognition
por: Yang, Da-Hee, et al.
Publicado: (2026)

Efficient Long-Form Speech Recognition for General Speech In-Context Learning
por: Yen, Hao, et al.
Publicado: (2024)

Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm
por: Li, Zhaoyang, et al.
Publicado: (2025)

PseudoVC: Improving One-shot Voice Conversion with Pseudo Paired Data
por: Cao, Songjun, et al.
Publicado: (2025)

Position-invariant Fine-tuning of Speech Enhancement Models with Self-supervised Speech Representations
por: Meghanani, Amit, et al.
Publicado: (2026)

Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
por: Lin, Guan-Ting, et al.
Publicado: (2024)

Reducing the Gap Between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging Module
por: Cui, Zhongjian, et al.
Publicado: (2025)

Advanced Long-Content Speech Recognition With Factorized Neural Transducer
por: Gong, Xun, et al.
Publicado: (2024)

Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers
por: Quan, Changsheng, et al.
Publicado: (2024)

Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
por: Ren, Wenze, et al.
Publicado: (2024)

Probing Self-supervised Learning Models with Target Speech Extraction
por: Peng, Junyi, et al.
Publicado: (2024)

In-Materia Speech Recognition
por: Zolfagharinejad, Mohamadreza, et al.
Publicado: (2024)

Speech Emotion Recognition with ASR Integration
por: Li, Yuanchao
Publicado: (2026)

Neural Directed Speech Enhancement with Dual Microphone Array in High Noise Scenario
por: Wen, Wen, et al.
Publicado: (2024)

Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning
por: Zhu, Xinfa, et al.
Publicado: (2023)

Objective and Subjective Evaluation of Diffusion-Based Speech Enhancement for Dysarthric Speech
por: de Groot, Dimme, et al.
Publicado: (2025)

Uncovering the Visual Contribution in Audio-Visual Speech Recognition
por: Lin, Zhaofeng, et al.
Publicado: (2024)

On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement
por: Hsieh, Tsun-An, et al.
Publicado: (2024)

Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
por: Shen, Siyuan, et al.
Publicado: (2024)