:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liang, Yayun, Zhang, Yuanming, Chen, Fei, Lu, Jing, Lin, Zhibin
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2601.20542
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Decoding Stimulus Reconstruction-Based Auditory Attention Robustly in Unbalanced EEG Datasets
by: Zhang, Yuanming, et al.
Published: (2026)

Multi-class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum
by: Zhang, Yuanming, et al.
Published: (2024)

Auditory Attention Decoding from Ear-EEG Signals: A Dataset with Dynamic Attention Switching and Rigorous Cross-Validation
by: Zhang, Yuanming, et al.
Published: (2025)

A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions
by: Wang, Zheng, et al.
Published: (2025)

Comparator Loss: An Ordinal Contrastive Loss to Derive a Severity Score for Speech-based Health Monitoring
by: Webber, Jacob J, et al.
Published: (2025)

Asymmetric Encoder-Decoder Based on Time-Frequency Correlation for Speech Separation
by: Shin, Ui-Hyeop, et al.
Published: (2026)

Noise-Aware Speech Separation with Contrastive Learning
by: Zhang, Zizheng, et al.
Published: (2023)

Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization
by: Wan, Genshun, et al.
Published: (2026)

Exploiting Consistency-Preserving Loss and Perceptual Contrast Stretching to Boost SSL-based Speech Enhancement
by: Khan, Muhammad Salman, et al.
Published: (2024)

Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval
by: Deng, Yimin, et al.
Published: (2024)

Reverberation-Robust Localization of Speakers Using Distinct Speech Onsets and Multi-channel Cross-Correlations
by: Lin, Shoufeng
Published: (2026)

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
by: Du, Chenpeng, et al.
Published: (2024)

DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition
by: Polok, Alexander, et al.
Published: (2025)

Improving Automatic Speech Recognition with Decoder-Centric Regularisation in Encoder-Decoder Models
by: Polok, Alexander, et al.
Published: (2024)

GESI: Gammachirp Envelope Similarity Index for Predicting Intelligibility of Simulated Hearing Loss Sounds
by: Yamamoto, Ayako, et al.
Published: (2023)

Enhancement of Dysarthric Speech Reconstruction by Contrastive Learning
by: Fatemeh, Keshvari, et al.
Published: (2024)

Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation
by: Shin, Ui-Hyeop, et al.
Published: (2024)

FNSE-SBGAN: Far-field Speech Enhancement with Schrodinger Bridge and Generative Adversarial Networks
by: Lei, Tong, et al.
Published: (2025)

EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition
by: Li, Pengcheng, et al.
Published: (2025)

Rethinking Flow and Diffusion Bridge Models for Speech Enhancement
by: Wang, Dahan, et al.
Published: (2026)

VoCodec: An Efficient Lightweight Low-Bitrate Speech Codec
by: Yang, Leyan, et al.
Published: (2026)

Modeling Multi-Level Hearing Loss for Speech Intelligibility Prediction
by: Zhou, Xiajie, et al.
Published: (2025)

Speech Separation using Neural Audio Codecs with Embedding Loss
by: Yip, Jia Qi, et al.
Published: (2024)

Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
by: Chen, Peikun, et al.
Published: (2024)

Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
by: Yang, Yifan, et al.
Published: (2026)

Quartered Chirp Spectral Envelope for Whispered vs Normal Speech Classification
by: Joysingh, S. Johanan, et al.
Published: (2024)

Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
by: Lin, Zijian, et al.
Published: (2025)

Large Language Model Guided Decoding for Self-Supervised Speech Recognition
by: Cohen, Eyal, et al.
Published: (2025)

Speech-Omni-Lite: Portable Speech Interfaces for Vision-Language Models
by: Tao, Dehua, et al.
Published: (2026)

Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
by: Wang, Hankun, et al.
Published: (2024)

Audiobook-CC: Controllable Long-context Speech Generation for Multicast Audiobook
by: Liu, Min, et al.
Published: (2025)

Subject Disentanglement Neural Network for Speech Envelope Reconstruction from EEG
by: Zhang, Li, et al.
Published: (2025)

Performance Modeling for Correlation-based Neural Decoding of Auditory Attention to Speech
by: Geirnaert, Simon, et al.
Published: (2025)

SACM: SEEG-Audio Contrastive Matching for Chinese Speech Decoding
by: Wang, Hongbin, et al.
Published: (2025)

Deep Filter Estimation from Inter-Frame Correlations for Monaural Speech Dereverberation
by: Shin, Ui-Hyeop, et al.
Published: (2026)

Attention-Based Beamformer For Multi-Channel Speech Enhancement
by: Bai, Jinglin, et al.
Published: (2024)

FairASR: Fair Audio Contrastive Learning for Automatic Speech Recognition
by: Kim, Jongsuk, et al.
Published: (2025)

Attention-weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied to Speech Emotion Recognition
by: Yang, Qingran, et al.
Published: (2026)

Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models
by: Jing, Xin, et al.
Published: (2024)

Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
by: Masuyama, Yoshiki, et al.
Published: (2024)