:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Songyi, Li, Linze, Zheng, Jinghua, Liang, Zifeng, Zhang
Format:	Preprint
Published:	2026
Subjects:	Sound Artificial Intelligence Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2603.02255
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MEBM-Phoneme: Multi-scale Enhanced BrainMagic for End-to-End MEG Phoneme Classification
by: Jinghua, Liang, et al.
Published: (2026)

GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning for Speech Emotion Recognition
by: Pan, Yu, et al.
Published: (2024)

HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding
by: Li, Bohan, et al.
Published: (2026)

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
by: Wang, Xinsheng, et al.
Published: (2025)

Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
by: Choi, Yerin, et al.
Published: (2024)

SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline
by: Wang, Helin, et al.
Published: (2025)

ECTSpeech: Enhancing Efficient Speech Synthesis via Easy Consistency Tuning
by: Zhu, Tao, et al.
Published: (2025)

Improving Pretrained YAMNet for Enhanced Speech Command Detection via Transfer Learning
by: Lachenani, Sidahmed, et al.
Published: (2025)

Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection
by: Zhang, Jinming, et al.
Published: (2025)

Imagined Speech State Classification for Robust Brain-Computer Interface
by: Ko, Byung-Kwan, et al.
Published: (2024)

Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection
by: Truong, Duc-Tuan, et al.
Published: (2024)

Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition
by: Kim, Jaeyoung, et al.
Published: (2024)

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
by: Wang, Yongqi, et al.
Published: (2023)

Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
by: Shi, Hao, et al.
Published: (2024)

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
by: Zhang, Shaolei, et al.
Published: (2024)

AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis
by: Cao, Yubing, et al.
Published: (2025)

Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
by: Lin, Zijian, et al.
Published: (2025)

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
by: Neekhara, Paarth, et al.
Published: (2024)

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
by: Jung, Jee-weon, et al.
Published: (2024)

EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations
by: Bian, Weizhen, et al.
Published: (2024)

Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis
by: Juvela, Lauri, et al.
Published: (2024)

MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement
by: Kühne, Nikolai Lund, et al.
Published: (2025)

Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)

Lina-Speech: Gated Linear Attention and Initial-State Tuning for Multi-Sample Prompting Text-To-Speech Synthesis
by: Lemerle, Théodor, et al.
Published: (2024)

Adaptive Knowledge Distillation for Device-Directed Speech Detection
by: Chi, Hyung Gun, et al.
Published: (2025)

Temporal-Aware Iterative Speech Model for Dementia Detection
by: Ugwu, Chukwuemeka, et al.
Published: (2025)

Leveraging Mixture of Experts for Improved Speech Deepfake Detection
by: Negroni, Viola, et al.
Published: (2024)

Phoneme-Level Feature Discrepancies: A Key to Detecting Sophisticated Speech Deepfakes
by: Zhang, Kuiyuan, et al.
Published: (2024)

MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention
by: Jiao, Xinxin, et al.
Published: (2024)

Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition
by: Wu, Linzhi, et al.
Published: (2026)

HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization
by: Ahn, Hyebin, et al.
Published: (2025)

EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection
by: Zhang, Tong, et al.
Published: (2025)

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge
by: Wang, He, et al.
Published: (2024)

VoiceBridge: General Speech Restoration with One-step Latent Bridge Models
by: Zhang, Chi, et al.
Published: (2025)

Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM
by: Zhang, Fengrun, et al.
Published: (2024)

Enhancing Speech Quality through the Integration of BGRU and Transformer Architectures
by: Alghnam, Souliman, et al.
Published: (2025)

Speech-based Clinical Depression Screening: An Empirical Study
by: Chen, Yangbin, et al.
Published: (2024)

Audio Deepfake Detection in the Age of Advanced Text-to-Speech models
by: Singh, Robin, et al.
Published: (2026)

Meta-Learning Approaches for Improving Detection of Unseen Speech Deepfakes
by: Kukanov, Ivan, et al.
Published: (2024)

HASS: Hierarchical Simulation of Logopenic Aphasic Speech for Scalable PPA Detection
by: Li, Harrison, et al.
Published: (2026)