Saved in:
| Main Authors: | Hinrichs, Reemt, Damara, Muhamad Fadli, Preihs, Stephan, Ostermann, Jörn |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.09461 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Pruning-aware Loss Functions for STOI-Optimized Pruned Recurrent Autoencoders for the Compression of the Stimulation Patterns of Cochlear Implants at Zero Delay
by: Hinrichs, Reemt, et al.
Published: (2025)
by: Hinrichs, Reemt, et al.
Published: (2025)
A Dataset for Automatic Vocal Mode Classification
by: Hinrichs, Reemt, et al.
Published: (2026)
by: Hinrichs, Reemt, et al.
Published: (2026)
LSTMSE-Net: Long Short Term Speech Enhancement Network for Audio-visual Speech Enhancement
by: Jain, Arnav, et al.
Published: (2024)
by: Jain, Arnav, et al.
Published: (2024)
Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
by: Abdullah, Abdulhady Abas, et al.
Published: (2024)
Future Full-Ocean Deep SSPs Prediction based on Hierarchical Long Short-Term Memory Neural Networks
by: Lu, Jiajun, et al.
Published: (2023)
by: Lu, Jiajun, et al.
Published: (2023)
Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis
by: Inoue, Sho, et al.
Published: (2024)
by: Inoue, Sho, et al.
Published: (2024)
Combined Generative and Predictive Modeling for Speech Super-resolution
by: Wang, Heming, et al.
Published: (2024)
by: Wang, Heming, et al.
Published: (2024)
Non-Invasive Suicide Risk Prediction Through Speech Analysis
by: Amiriparian, Shahin, et al.
Published: (2024)
by: Amiriparian, Shahin, et al.
Published: (2024)
Perceived Femininity in Singing Voice: Analysis and Prediction
by: Kong, Yuexuan, et al.
Published: (2025)
by: Kong, Yuexuan, et al.
Published: (2025)
Harmonic Detection from Noisy Speech with Auditory Frame Gain for Intelligibility Enhancement
by: Queiroz, A., et al.
Published: (2024)
by: Queiroz, A., et al.
Published: (2024)
Investigation of Zero-shot Text-to-Speech Models for Enhancing Short-Utterance Speaker Verification
by: Zhao, Yiyang, et al.
Published: (2025)
by: Zhao, Yiyang, et al.
Published: (2025)
Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction
by: Zhang, Leying, et al.
Published: (2025)
by: Zhang, Leying, et al.
Published: (2025)
The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
by: Huang, Wen-Chin, et al.
Published: (2024)
by: Huang, Wen-Chin, et al.
Published: (2024)
Faster Speech-LLaMA Inference with Multi-token Prediction
by: Raj, Desh, et al.
Published: (2024)
by: Raj, Desh, et al.
Published: (2024)
Latent-Domain Predictive Neural Speech Coding
by: Jiang, Xue, et al.
Published: (2022)
by: Jiang, Xue, et al.
Published: (2022)
A Multi-decoder Neural Tracking Method for Accurately Predicting Speech Intelligibility
by: Sonck, Rien, et al.
Published: (2026)
by: Sonck, Rien, et al.
Published: (2026)
Time vs. Layer: Locating Predictive Cues for Dysarthric Speech Descriptors in wav2vec 2.0
by: Engert, Natalie, et al.
Published: (2026)
by: Engert, Natalie, et al.
Published: (2026)
DeepGESI: A Non-Intrusive Objective Evaluation Model for Predicting Speech Intelligibility in Hearing-Impaired Listeners
by: Luo, Wenyu, et al.
Published: (2025)
by: Luo, Wenyu, et al.
Published: (2025)
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
by: Chen, Li-Wei, et al.
Published: (2024)
by: Chen, Li-Wei, et al.
Published: (2024)
Representing Speech Through Autoregressive Prediction of Cochlear Tokens
by: Tuckute, Greta, et al.
Published: (2025)
by: Tuckute, Greta, et al.
Published: (2025)
Multi-Utterance Speech Separation and Association Trained on Short Segments
by: Wang, Yuzhu, et al.
Published: (2025)
by: Wang, Yuzhu, et al.
Published: (2025)
MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-token Prediction
by: Wang, Jianjin, et al.
Published: (2025)
by: Wang, Jianjin, et al.
Published: (2025)
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
by: Xu, Mohan, et al.
Published: (2024)
by: Xu, Mohan, et al.
Published: (2024)
GAP-URGENet: A Generative-Predictive Fusion Framework for Universal Speech Enhancement
by: Rong, Xiaobin, et al.
Published: (2026)
by: Rong, Xiaobin, et al.
Published: (2026)
Multi-Step Prediction and Control of Hierarchical Emotion Distribution in Text-to-Speech Synthesis
by: Inoue, Sho, et al.
Published: (2025)
by: Inoue, Sho, et al.
Published: (2025)
KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction
by: Xia, Kangxiang, et al.
Published: (2024)
by: Xia, Kangxiang, et al.
Published: (2024)
Stage-Wise and Prior-Aware Neural Speech Phase Prediction
by: Liu, Fei, et al.
Published: (2024)
by: Liu, Fei, et al.
Published: (2024)
Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders
by: Shi, Hao, et al.
Published: (2023)
by: Shi, Hao, et al.
Published: (2023)
Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation
by: Lin, Zhennan, et al.
Published: (2025)
by: Lin, Zhennan, et al.
Published: (2025)
No Audiogram: Leveraging Existing Scores for Personalized Speech Intelligibility Prediction
by: Zhou, Haoshuai, et al.
Published: (2025)
by: Zhou, Haoshuai, et al.
Published: (2025)
Low-Latency Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses for Speech Generation Tasks
by: Ai, Yang, et al.
Published: (2024)
by: Ai, Yang, et al.
Published: (2024)
Dual-View Predictive Diffusion: Lightweight Speech Enhancement via Spectrogram-Image Synergy
by: Xue, Ke, et al.
Published: (2026)
by: Xue, Ke, et al.
Published: (2026)
Non-Intrusive Binaural Speech Intelligibility Prediction Using Mamba for Hearing-Impaired Listeners
by: Yamamoto, Katsuhiko, et al.
Published: (2025)
by: Yamamoto, Katsuhiko, et al.
Published: (2025)
Causal Speech Enhancement with Predicting Semantics based on Quantized Self-supervised Learning Features
by: Tsunoo, Emiru, et al.
Published: (2024)
by: Tsunoo, Emiru, et al.
Published: (2024)
Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People
by: Zhou, Haoshuai, et al.
Published: (2025)
by: Zhou, Haoshuai, et al.
Published: (2025)
Articulatory Feature Prediction from Surface EMG during Speech Production
by: Lee, Jihwan, et al.
Published: (2025)
by: Lee, Jihwan, et al.
Published: (2025)
Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
by: Wang, Wupeng, et al.
Published: (2025)
by: Wang, Wupeng, et al.
Published: (2025)
Selection of Layers from Self-supervised Learning Models for Predicting Mean-Opinion-Score of Speech
by: Liang, Xinyu, et al.
Published: (2025)
by: Liang, Xinyu, et al.
Published: (2025)
Feature Importance across Domains for Improving Non-Intrusive Speech Intelligibility Prediction in Hearing Aids
by: Zezario, Ryandhimas E., et al.
Published: (2025)
by: Zezario, Ryandhimas E., et al.
Published: (2025)
Speech Enhancement with Dual-path Multi-Channel Linear Prediction Filter and Multi-norm Beamforming
by: Qin, Chengyuan, et al.
Published: (2025)
by: Qin, Chengyuan, et al.
Published: (2025)
Similar Items
-
Pruning-aware Loss Functions for STOI-Optimized Pruned Recurrent Autoencoders for the Compression of the Stimulation Patterns of Cochlear Implants at Zero Delay
by: Hinrichs, Reemt, et al.
Published: (2025) -
A Dataset for Automatic Vocal Mode Classification
by: Hinrichs, Reemt, et al.
Published: (2026) -
LSTMSE-Net: Long Short Term Speech Enhancement Network for Audio-visual Speech Enhancement
by: Jain, Arnav, et al.
Published: (2024) -
Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach
by: Abdullah, Abdulhady Abas, et al.
Published: (2024) -
Future Full-Ocean Deep SSPs Prediction based on Hierarchical Long Short-Term Memory Neural Networks
by: Lu, Jiajun, et al.
Published: (2023)