Saved in:
| Main Authors: | Wang, Taihui, Zhao, Jinzheng, Chen, Rilin, Lei, Tong, Wang, Wenwu, Yu, Dong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.20573 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Target matching based generative model for speech enhancement
by: Wang, Taihui, et al.
Published: (2025)
by: Wang, Taihui, et al.
Published: (2025)
AudioRAG+: Feedback-driven Retrieval-augmented Audio Generation with Large Audio Language Models
by: Zhao, Junqi, et al.
Published: (2025)
by: Zhao, Junqi, et al.
Published: (2025)
Heterogeneous bimodal attention fusion for speech emotion recognition
by: Luo, Jiachen, et al.
Published: (2025)
by: Luo, Jiachen, et al.
Published: (2025)
Graph-based multi-Feature fusion method for speech emotion recognition
by: Liu, Xueyu, et al.
Published: (2024)
by: Liu, Xueyu, et al.
Published: (2024)
Scalable Neural Vocoder from Range-Null Space Decomposition
by: Li, Andong, et al.
Published: (2026)
by: Li, Andong, et al.
Published: (2026)
Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
by: Nasr, Seham, et al.
Published: (2025)
by: Nasr, Seham, et al.
Published: (2025)
BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective
by: Li, Andong, et al.
Published: (2025)
by: Li, Andong, et al.
Published: (2025)
A vector quantized masked autoencoder for audiovisual speech emotion recognition
by: Sadok, Samir, et al.
Published: (2023)
by: Sadok, Samir, et al.
Published: (2023)
Charting 15 years of progress in deep learning for speech emotion recognition: A replication study
by: Triantafyllopoulos, Andreas, et al.
Published: (2025)
by: Triantafyllopoulos, Andreas, et al.
Published: (2025)
Enhancing CTC-based speech recognition with diverse modeling units
by: Han, Shiyi, et al.
Published: (2024)
by: Han, Shiyi, et al.
Published: (2024)
Fusion approaches for emotion recognition from speech using acoustic and text-based features
by: Pepino, Leonardo, et al.
Published: (2024)
by: Pepino, Leonardo, et al.
Published: (2024)
learning discriminative features from spectrograms using center loss for speech emotion recognition
by: Dai, Dongyang, et al.
Published: (2025)
by: Dai, Dongyang, et al.
Published: (2025)
Explainable speech emotion recognition through attentive pooling: insights from attention-based temporal localization
by: Leygue, Tahitoa, et al.
Published: (2025)
by: Leygue, Tahitoa, et al.
Published: (2025)
Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Survey
by: Cui, Meng, et al.
Published: (2024)
by: Cui, Meng, et al.
Published: (2024)
Robust fine-tuning of speech recognition models via model merging: application to disordered speech
by: Ducorroy, Alexandre, et al.
Published: (2025)
by: Ducorroy, Alexandre, et al.
Published: (2025)
Learning Neural Vocoder from Range-Null Space Decomposition
by: Li, Andong, et al.
Published: (2025)
by: Li, Andong, et al.
Published: (2025)
Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition
by: Dong, Lukuang, et al.
Published: (2026)
by: Dong, Lukuang, et al.
Published: (2026)
Video-to-Audio Generation with Fine-grained Temporal Semantics
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
by: Zhao, Junqi, et al.
Published: (2025)
by: Zhao, Junqi, et al.
Published: (2025)
Introduction to speech recognition
by: Dauphin, Gabriel
Published: (2024)
by: Dauphin, Gabriel
Published: (2024)
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
by: Gong, Rong, et al.
Published: (2024)
by: Gong, Rong, et al.
Published: (2024)
Universal Sound Separation with Self-Supervised Audio Masked Autoencoder
by: Zhao, Junqi, et al.
Published: (2024)
by: Zhao, Junqi, et al.
Published: (2024)
Improving child speech recognition with augmented child-like speech
by: Zhang, Yuanyuan, et al.
Published: (2024)
by: Zhang, Yuanyuan, et al.
Published: (2024)
From Continuous to Discrete: Cross-Domain Collaborative General Speech Enhancement via Hierarchical Language Models
by: Mu, Zhaoxi, et al.
Published: (2025)
by: Mu, Zhaoxi, et al.
Published: (2025)
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
by: Wang, Helin, et al.
Published: (2024)
by: Wang, Helin, et al.
Published: (2024)
TTS-CtrlNet: Time varying emotion aligned text-to-speech generation with ControlNet
by: Jeong, Jaeseok, et al.
Published: (2025)
by: Jeong, Jaeseok, et al.
Published: (2025)
Multi-channel multi-speaker transformer for speech recognition
by: Yifan, Guo, et al.
Published: (2026)
by: Yifan, Guo, et al.
Published: (2026)
Language model integration based on memory control for sequence to sequence speech recognition
by: Cho, Jaejin, et al.
Published: (2018)
by: Cho, Jaejin, et al.
Published: (2018)
Phoneme-based speech recognition driven by large language models and sampling marginalization
by: Ma, Te, et al.
Published: (2025)
by: Ma, Te, et al.
Published: (2025)
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
by: Maiti, Soumi, et al.
Published: (2023)
by: Maiti, Soumi, et al.
Published: (2023)
Index-MSR: A high-efficiency multimodal fusion framework for speech recognition
by: Chen, Jinming, et al.
Published: (2025)
by: Chen, Jinming, et al.
Published: (2025)
Region-Specific Audio Tagging for Spatial Sound
by: Zhao, Jinzheng, et al.
Published: (2025)
by: Zhao, Jinzheng, et al.
Published: (2025)
SMRU: Split-and-Merge Recurrent-based UNet for Acoustic Echo Cancellation and Noise Suppression
by: Sun, Zhihang, et al.
Published: (2024)
by: Sun, Zhihang, et al.
Published: (2024)
Training chord recognition models on artificially generated audio
by: Majchrzak, Martyna, et al.
Published: (2025)
by: Majchrzak, Martyna, et al.
Published: (2025)
A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model
by: Zhao, Dongdi, et al.
Published: (2024)
by: Zhao, Dongdi, et al.
Published: (2024)
Keyword spotting using convolutional neural network for speech recognition in Hindi
by: Bharti, Saru, et al.
Published: (2026)
by: Bharti, Saru, et al.
Published: (2026)
Exploring the limits of decoder-only models trained on public speech recognition corpora
by: Gupta, Ankit, et al.
Published: (2024)
by: Gupta, Ankit, et al.
Published: (2024)
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
by: Ren, Yong, et al.
Published: (2024)
by: Ren, Yong, et al.
Published: (2024)
StemGen: A music generation model that listens
by: Parker, Julian D., et al.
Published: (2023)
by: Parker, Julian D., et al.
Published: (2023)
Versatile audio-visual learning for emotion recognition
by: Goncalves, Lucas, et al.
Published: (2023)
by: Goncalves, Lucas, et al.
Published: (2023)
Similar Items
-
Target matching based generative model for speech enhancement
by: Wang, Taihui, et al.
Published: (2025) -
AudioRAG+: Feedback-driven Retrieval-augmented Audio Generation with Large Audio Language Models
by: Zhao, Junqi, et al.
Published: (2025) -
Heterogeneous bimodal attention fusion for speech emotion recognition
by: Luo, Jiachen, et al.
Published: (2025) -
Graph-based multi-Feature fusion method for speech emotion recognition
by: Liu, Xueyu, et al.
Published: (2024) -
Scalable Neural Vocoder from Range-Null Space Decomposition
by: Li, Andong, et al.
Published: (2026)