:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Taihui, Zhao, Jinzheng, Chen, Rilin, Lei, Tong, Wang, Wenwu, Yu, Dong
Format:	Preprint
Published:	2026
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2601.20573
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Target matching based generative model for speech enhancement
by: Wang, Taihui, et al.
Published: (2025)

AudioRAG+: Feedback-driven Retrieval-augmented Audio Generation with Large Audio Language Models
by: Zhao, Junqi, et al.
Published: (2025)

Heterogeneous bimodal attention fusion for speech emotion recognition
by: Luo, Jiachen, et al.
Published: (2025)

Graph-based multi-Feature fusion method for speech emotion recognition
by: Liu, Xueyu, et al.
Published: (2024)

Scalable Neural Vocoder from Range-Null Space Decomposition
by: Li, Andong, et al.
Published: (2026)

Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
by: Nasr, Seham, et al.
Published: (2025)

BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective
by: Li, Andong, et al.
Published: (2025)

A vector quantized masked autoencoder for audiovisual speech emotion recognition
by: Sadok, Samir, et al.
Published: (2023)

Charting 15 years of progress in deep learning for speech emotion recognition: A replication study
by: Triantafyllopoulos, Andreas, et al.
Published: (2025)

Enhancing CTC-based speech recognition with diverse modeling units
by: Han, Shiyi, et al.
Published: (2024)

Fusion approaches for emotion recognition from speech using acoustic and text-based features
by: Pepino, Leonardo, et al.
Published: (2024)

learning discriminative features from spectrograms using center loss for speech emotion recognition
by: Dai, Dongyang, et al.
Published: (2025)

Explainable speech emotion recognition through attentive pooling: insights from attention-based temporal localization
by: Leygue, Tahitoa, et al.
Published: (2025)

Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Survey
by: Cui, Meng, et al.
Published: (2024)

Robust fine-tuning of speech recognition models via model merging: application to disordered speech
by: Ducorroy, Alexandre, et al.
Published: (2025)

Learning Neural Vocoder from Range-Null Space Decomposition
by: Li, Andong, et al.
Published: (2025)

Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition
by: Dong, Lukuang, et al.
Published: (2026)

Video-to-Audio Generation with Fine-grained Temporal Semantics
by: Hu, Yuchen, et al.
Published: (2024)

AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
by: Zhao, Junqi, et al.
Published: (2025)

Introduction to speech recognition
by: Dauphin, Gabriel
Published: (2024)

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
by: Gong, Rong, et al.
Published: (2024)

Universal Sound Separation with Self-Supervised Audio Masked Autoencoder
by: Zhao, Junqi, et al.
Published: (2024)

Improving child speech recognition with augmented child-like speech
by: Zhang, Yuanyuan, et al.
Published: (2024)

From Continuous to Discrete: Cross-Domain Collaborative General Speech Enhancement via Hierarchical Language Models
by: Mu, Zhaoxi, et al.
Published: (2025)

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
by: Wang, Helin, et al.
Published: (2024)

TTS-CtrlNet: Time varying emotion aligned text-to-speech generation with ControlNet
by: Jeong, Jaeseok, et al.
Published: (2025)

Multi-channel multi-speaker transformer for speech recognition
by: Yifan, Guo, et al.
Published: (2026)

Language model integration based on memory control for sequence to sequence speech recognition
by: Cho, Jaejin, et al.
Published: (2018)

Phoneme-based speech recognition driven by large language models and sampling marginalization
by: Ma, Te, et al.
Published: (2025)

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
by: Maiti, Soumi, et al.
Published: (2023)

Index-MSR: A high-efficiency multimodal fusion framework for speech recognition
by: Chen, Jinming, et al.
Published: (2025)

Region-Specific Audio Tagging for Spatial Sound
by: Zhao, Jinzheng, et al.
Published: (2025)

SMRU: Split-and-Merge Recurrent-based UNet for Acoustic Echo Cancellation and Noise Suppression
by: Sun, Zhihang, et al.
Published: (2024)

Training chord recognition models on artificially generated audio
by: Majchrzak, Martyna, et al.
Published: (2025)

A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model
by: Zhao, Dongdi, et al.
Published: (2024)

Keyword spotting using convolutional neural network for speech recognition in Hindi
by: Bharti, Saru, et al.
Published: (2026)

Exploring the limits of decoder-only models trained on public speech recognition corpora
by: Gupta, Ankit, et al.
Published: (2024)

STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
by: Ren, Yong, et al.
Published: (2024)

StemGen: A music generation model that listens
by: Parker, Julian D., et al.
Published: (2023)

Versatile audio-visual learning for emotion recognition
by: Goncalves, Lucas, et al.
Published: (2023)