Saved in:
| Main Authors: | Songyi, Li, Linze, Zheng, Jinghua, Liang, Zifeng, Zhang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.02255 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MEBM-Phoneme: Multi-scale Enhanced BrainMagic for End-to-End MEG Phoneme Classification
by: Jinghua, Liang, et al.
Published: (2026)
by: Jinghua, Liang, et al.
Published: (2026)
GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning for Speech Emotion Recognition
by: Pan, Yu, et al.
Published: (2024)
by: Pan, Yu, et al.
Published: (2024)
HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding
by: Li, Bohan, et al.
Published: (2026)
by: Li, Bohan, et al.
Published: (2026)
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
by: Wang, Xinsheng, et al.
Published: (2025)
by: Wang, Xinsheng, et al.
Published: (2025)
Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
by: Choi, Yerin, et al.
Published: (2024)
by: Choi, Yerin, et al.
Published: (2024)
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline
by: Wang, Helin, et al.
Published: (2025)
by: Wang, Helin, et al.
Published: (2025)
ECTSpeech: Enhancing Efficient Speech Synthesis via Easy Consistency Tuning
by: Zhu, Tao, et al.
Published: (2025)
by: Zhu, Tao, et al.
Published: (2025)
Improving Pretrained YAMNet for Enhanced Speech Command Detection via Transfer Learning
by: Lachenani, Sidahmed, et al.
Published: (2025)
by: Lachenani, Sidahmed, et al.
Published: (2025)
Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection
by: Zhang, Jinming, et al.
Published: (2025)
by: Zhang, Jinming, et al.
Published: (2025)
Imagined Speech State Classification for Robust Brain-Computer Interface
by: Ko, Byung-Kwan, et al.
Published: (2024)
by: Ko, Byung-Kwan, et al.
Published: (2024)
Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection
by: Truong, Duc-Tuan, et al.
Published: (2024)
by: Truong, Duc-Tuan, et al.
Published: (2024)
Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition
by: Kim, Jaeyoung, et al.
Published: (2024)
by: Kim, Jaeyoung, et al.
Published: (2024)
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
by: Wang, Yongqi, et al.
Published: (2023)
by: Wang, Yongqi, et al.
Published: (2023)
Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
by: Shi, Hao, et al.
Published: (2024)
by: Shi, Hao, et al.
Published: (2024)
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
by: Zhang, Shaolei, et al.
Published: (2024)
by: Zhang, Shaolei, et al.
Published: (2024)
AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis
by: Cao, Yubing, et al.
Published: (2025)
by: Cao, Yubing, et al.
Published: (2025)
Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
by: Lin, Zijian, et al.
Published: (2025)
by: Lin, Zijian, et al.
Published: (2025)
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
by: Neekhara, Paarth, et al.
Published: (2024)
by: Neekhara, Paarth, et al.
Published: (2024)
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
by: Jung, Jee-weon, et al.
Published: (2024)
by: Jung, Jee-weon, et al.
Published: (2024)
EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations
by: Bian, Weizhen, et al.
Published: (2024)
by: Bian, Weizhen, et al.
Published: (2024)
Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis
by: Juvela, Lauri, et al.
Published: (2024)
by: Juvela, Lauri, et al.
Published: (2024)
MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement
by: Kühne, Nikolai Lund, et al.
Published: (2025)
by: Kühne, Nikolai Lund, et al.
Published: (2025)
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)
by: Zhou, Xuanru, et al.
Published: (2024)
Lina-Speech: Gated Linear Attention and Initial-State Tuning for Multi-Sample Prompting Text-To-Speech Synthesis
by: Lemerle, Théodor, et al.
Published: (2024)
by: Lemerle, Théodor, et al.
Published: (2024)
Adaptive Knowledge Distillation for Device-Directed Speech Detection
by: Chi, Hyung Gun, et al.
Published: (2025)
by: Chi, Hyung Gun, et al.
Published: (2025)
Temporal-Aware Iterative Speech Model for Dementia Detection
by: Ugwu, Chukwuemeka, et al.
Published: (2025)
by: Ugwu, Chukwuemeka, et al.
Published: (2025)
Leveraging Mixture of Experts for Improved Speech Deepfake Detection
by: Negroni, Viola, et al.
Published: (2024)
by: Negroni, Viola, et al.
Published: (2024)
Phoneme-Level Feature Discrepancies: A Key to Detecting Sophisticated Speech Deepfakes
by: Zhang, Kuiyuan, et al.
Published: (2024)
by: Zhang, Kuiyuan, et al.
Published: (2024)
MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention
by: Jiao, Xinxin, et al.
Published: (2024)
by: Jiao, Xinxin, et al.
Published: (2024)
Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition
by: Wu, Linzhi, et al.
Published: (2026)
by: Wu, Linzhi, et al.
Published: (2026)
HuBERT-VIC: Improving Noise-Robust Automatic Speech Recognition of Speech Foundation Model via Variance-Invariance-Covariance Regularization
by: Ahn, Hyebin, et al.
Published: (2025)
by: Ahn, Hyebin, et al.
Published: (2025)
EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection
by: Zhang, Tong, et al.
Published: (2025)
by: Zhang, Tong, et al.
Published: (2025)
ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge
by: Wang, He, et al.
Published: (2024)
by: Wang, He, et al.
Published: (2024)
VoiceBridge: General Speech Restoration with One-step Latent Bridge Models
by: Zhang, Chi, et al.
Published: (2025)
by: Zhang, Chi, et al.
Published: (2025)
Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM
by: Zhang, Fengrun, et al.
Published: (2024)
by: Zhang, Fengrun, et al.
Published: (2024)
Enhancing Speech Quality through the Integration of BGRU and Transformer Architectures
by: Alghnam, Souliman, et al.
Published: (2025)
by: Alghnam, Souliman, et al.
Published: (2025)
Speech-based Clinical Depression Screening: An Empirical Study
by: Chen, Yangbin, et al.
Published: (2024)
by: Chen, Yangbin, et al.
Published: (2024)
Audio Deepfake Detection in the Age of Advanced Text-to-Speech models
by: Singh, Robin, et al.
Published: (2026)
by: Singh, Robin, et al.
Published: (2026)
Meta-Learning Approaches for Improving Detection of Unseen Speech Deepfakes
by: Kukanov, Ivan, et al.
Published: (2024)
by: Kukanov, Ivan, et al.
Published: (2024)
HASS: Hierarchical Simulation of Logopenic Aphasic Speech for Scalable PPA Detection
by: Li, Harrison, et al.
Published: (2026)
by: Li, Harrison, et al.
Published: (2026)
Similar Items
-
MEBM-Phoneme: Multi-scale Enhanced BrainMagic for End-to-End MEG Phoneme Classification
by: Jinghua, Liang, et al.
Published: (2026) -
GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning for Speech Emotion Recognition
by: Pan, Yu, et al.
Published: (2024) -
HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding
by: Li, Bohan, et al.
Published: (2026) -
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
by: Wang, Xinsheng, et al.
Published: (2025) -
Speech Recognition-based Feature Extraction for Enhanced Automatic Severity Classification in Dysarthric Speech
by: Choi, Yerin, et al.
Published: (2024)