:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jinghua, Liang, Zifeng, Zhang, Songyi, Li, Linze, Zheng
Format:	Preprint
Published:	2026
Subjects:	Sound Artificial Intelligence Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2603.02254
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection
by: Songyi, Li, et al.
Published: (2026)

Phoneme-Level Feature Discrepancies: A Key to Detecting Sophisticated Speech Deepfakes
by: Zhang, Kuiyuan, et al.
Published: (2024)

Controllable Singing Voice Synthesis using Phoneme-Level Energy Sequence
by: Ryu, Yerin, et al.
Published: (2025)

Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning
by: Medin, Lucas Block, et al.
Published: (2025)

Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement
by: Monir, Nasser-Eddine, et al.
Published: (2025)

ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody
by: Pan, Jianan, et al.
Published: (2026)

Data-Efficient ASR Personalization for Non-Normative Speech Using an Uncertainty-Based Phoneme Difficulty Score for Guided Sampling
by: Pokel, Niclas, et al.
Published: (2025)

Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition
by: Chen, Jinming, et al.
Published: (2024)

Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment
by: Azad, Asif, et al.
Published: (2026)

The 2025 PNPL Competition: Speech Detection and Phoneme Classification in the LibriBrain Dataset
by: Landau, Gilad, et al.
Published: (2025)

From Modular to End-to-End Speaker Diarization
by: Landini, Federico
Published: (2024)

TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition
by: Anh, Tran Nguyen, et al.
Published: (2025)

Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)

An End-to-End Approach for Chord-Conditioned Song Generation
by: Gao, Shuochen, et al.
Published: (2024)

End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024)

Prosody Labeling with Phoneme-BERT and Speech Foundation Models
by: Koriyama, Tomoki
Published: (2025)

Neuro-MSBG: An End-to-End Neural Model for Hearing Loss Simulation
by: Yuan, Hui-Guan, et al.
Published: (2025)

PMF-CEC: Phoneme-augmented Multimodal Fusion for Context-aware ASR Error Correction with Error-specific Selective Decoding
by: He, Jiajun, et al.
Published: (2025)

Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)

End-to-End User-Defined Keyword Spotting using Shifted Delta Coefficients
by: V, Kesavaraj, et al.
Published: (2024)

End-to-end multi-channel speaker extraction and binaural speech synthesis
by: Chi, Cheng, et al.
Published: (2024)

Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech
by: Weise, Tobias, et al.
Published: (2024)

Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)

End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding
by: Zeng, Wei, et al.
Published: (2024)

A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms
by: Monir, Nasser-Eddine, et al.
Published: (2024)

Phoneme-Level Analysis for Person-of-Interest Speech Deepfake Detection
by: Salvi, Davide, et al.
Published: (2025)

Towards Accurate Phonetic Error Detection Through Phoneme Similarity Modeling
by: Zhou, Xuanru, et al.
Published: (2025)

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
by: Dai, Dongyang, et al.
Published: (2025)

Wave-U-Mamba: An End-To-End Framework For High-Quality And Efficient Speech Super Resolution
by: Lee, Yongjoon, et al.
Published: (2024)

Phoneme-based speech recognition driven by large language models and sampling marginalization
by: Ma, Te, et al.
Published: (2025)

Phoneme-Level Contrastive Learning for User-Defined Keyword Spotting with Flexible Enrollment
by: Kewei, Li, et al.
Published: (2024)

Evaluating Multichannel Speech Enhancement Algorithms at the Phoneme Scale Across Genders
by: Monir, Nasser-Eddine, et al.
Published: (2025)

Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
by: He, Xinlu, et al.
Published: (2025)

Toward Fully-End-to-End Listened Speech Decoding from EEG Signals
by: Lee, Jihwan, et al.
Published: (2024)

TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)

VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features
by: Koriyama, Tomoki
Published: (2024)

MUDAS: Mote-scale Unsupervised Domain Adaptation in Multi-label Sound Classification
by: Yun, Jihoon, et al.
Published: (2025)

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
by: Fujita, Kenichi, et al.
Published: (2024)

Recent Advances in End-to-End Simultaneous Speech Translation
by: Liu, Xiaoqian, et al.
Published: (2024)

Retrieval Augmented End-to-End Spoken Dialog Models
by: Wang, Mingqiu, et al.
Published: (2024)