Saved in:
| Main Authors: | Jinghua, Liang, Zifeng, Zhang, Songyi, Li, Linze, Zheng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.02254 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection
by: Songyi, Li, et al.
Published: (2026)
by: Songyi, Li, et al.
Published: (2026)
Phoneme-Level Feature Discrepancies: A Key to Detecting Sophisticated Speech Deepfakes
by: Zhang, Kuiyuan, et al.
Published: (2024)
by: Zhang, Kuiyuan, et al.
Published: (2024)
Controllable Singing Voice Synthesis using Phoneme-Level Energy Sequence
by: Ryu, Yerin, et al.
Published: (2025)
by: Ryu, Yerin, et al.
Published: (2025)
Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning
by: Medin, Lucas Block, et al.
Published: (2025)
by: Medin, Lucas Block, et al.
Published: (2025)
Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement
by: Monir, Nasser-Eddine, et al.
Published: (2025)
by: Monir, Nasser-Eddine, et al.
Published: (2025)
ProKWS: Personalized Keyword Spotting via Collaborative Learning of Phonemes and Prosody
by: Pan, Jianan, et al.
Published: (2026)
by: Pan, Jianan, et al.
Published: (2026)
Data-Efficient ASR Personalization for Non-Normative Speech Using an Uncertainty-Based Phoneme Difficulty Score for Guided Sampling
by: Pokel, Niclas, et al.
Published: (2025)
by: Pokel, Niclas, et al.
Published: (2025)
Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition
by: Chen, Jinming, et al.
Published: (2024)
by: Chen, Jinming, et al.
Published: (2024)
Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment
by: Azad, Asif, et al.
Published: (2026)
by: Azad, Asif, et al.
Published: (2026)
The 2025 PNPL Competition: Speech Detection and Phoneme Classification in the LibriBrain Dataset
by: Landau, Gilad, et al.
Published: (2025)
by: Landau, Gilad, et al.
Published: (2025)
From Modular to End-to-End Speaker Diarization
by: Landini, Federico
Published: (2024)
by: Landini, Federico
Published: (2024)
TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition
by: Anh, Tran Nguyen, et al.
Published: (2025)
by: Anh, Tran Nguyen, et al.
Published: (2025)
Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)
by: Zhou, Xuanru, et al.
Published: (2024)
An End-to-End Approach for Chord-Conditioned Song Generation
by: Gao, Shuochen, et al.
Published: (2024)
by: Gao, Shuochen, et al.
Published: (2024)
End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024)
by: Singh, Prachi, et al.
Published: (2024)
Prosody Labeling with Phoneme-BERT and Speech Foundation Models
by: Koriyama, Tomoki
Published: (2025)
by: Koriyama, Tomoki
Published: (2025)
Neuro-MSBG: An End-to-End Neural Model for Hearing Loss Simulation
by: Yuan, Hui-Guan, et al.
Published: (2025)
by: Yuan, Hui-Guan, et al.
Published: (2025)
PMF-CEC: Phoneme-augmented Multimodal Fusion for Context-aware ASR Error Correction with Error-specific Selective Decoding
by: He, Jiajun, et al.
Published: (2025)
by: He, Jiajun, et al.
Published: (2025)
Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)
End-to-End User-Defined Keyword Spotting using Shifted Delta Coefficients
by: V, Kesavaraj, et al.
Published: (2024)
by: V, Kesavaraj, et al.
Published: (2024)
End-to-end multi-channel speaker extraction and binaural speech synthesis
by: Chi, Cheng, et al.
Published: (2024)
by: Chi, Cheng, et al.
Published: (2024)
Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech
by: Weise, Tobias, et al.
Published: (2024)
by: Weise, Tobias, et al.
Published: (2024)
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)
by: Zhou, Xuanru, et al.
Published: (2024)
End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding
by: Zeng, Wei, et al.
Published: (2024)
by: Zeng, Wei, et al.
Published: (2024)
A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms
by: Monir, Nasser-Eddine, et al.
Published: (2024)
by: Monir, Nasser-Eddine, et al.
Published: (2024)
Phoneme-Level Analysis for Person-of-Interest Speech Deepfake Detection
by: Salvi, Davide, et al.
Published: (2025)
by: Salvi, Davide, et al.
Published: (2025)
Towards Accurate Phonetic Error Detection Through Phoneme Similarity Modeling
by: Zhou, Xuanru, et al.
Published: (2025)
by: Zhou, Xuanru, et al.
Published: (2025)
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
by: Dai, Dongyang, et al.
Published: (2025)
by: Dai, Dongyang, et al.
Published: (2025)
Wave-U-Mamba: An End-To-End Framework For High-Quality And Efficient Speech Super Resolution
by: Lee, Yongjoon, et al.
Published: (2024)
by: Lee, Yongjoon, et al.
Published: (2024)
Phoneme-based speech recognition driven by large language models and sampling marginalization
by: Ma, Te, et al.
Published: (2025)
by: Ma, Te, et al.
Published: (2025)
Phoneme-Level Contrastive Learning for User-Defined Keyword Spotting with Flexible Enrollment
by: Kewei, Li, et al.
Published: (2024)
by: Kewei, Li, et al.
Published: (2024)
Evaluating Multichannel Speech Enhancement Algorithms at the Phoneme Scale Across Genders
by: Monir, Nasser-Eddine, et al.
Published: (2025)
by: Monir, Nasser-Eddine, et al.
Published: (2025)
Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
by: He, Xinlu, et al.
Published: (2025)
by: He, Xinlu, et al.
Published: (2025)
Toward Fully-End-to-End Listened Speech Decoding from EEG Signals
by: Lee, Jihwan, et al.
Published: (2024)
by: Lee, Jihwan, et al.
Published: (2024)
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
by: Bataev, Vladimir, et al.
Published: (2025)
by: Bataev, Vladimir, et al.
Published: (2025)
VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features
by: Koriyama, Tomoki
Published: (2024)
by: Koriyama, Tomoki
Published: (2024)
MUDAS: Mote-scale Unsupervised Domain Adaptation in Multi-label Sound Classification
by: Yun, Jihoon, et al.
Published: (2025)
by: Yun, Jihoon, et al.
Published: (2025)
Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
by: Fujita, Kenichi, et al.
Published: (2024)
by: Fujita, Kenichi, et al.
Published: (2024)
Recent Advances in End-to-End Simultaneous Speech Translation
by: Liu, Xiaoqian, et al.
Published: (2024)
by: Liu, Xiaoqian, et al.
Published: (2024)
Retrieval Augmented End-to-End Spoken Dialog Models
by: Wang, Mingqiu, et al.
Published: (2024)
by: Wang, Mingqiu, et al.
Published: (2024)
Similar Items
-
MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection
by: Songyi, Li, et al.
Published: (2026) -
Phoneme-Level Feature Discrepancies: A Key to Detecting Sophisticated Speech Deepfakes
by: Zhang, Kuiyuan, et al.
Published: (2024) -
Controllable Singing Voice Synthesis using Phoneme-Level Energy Sequence
by: Ryu, Yerin, et al.
Published: (2025) -
Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning
by: Medin, Lucas Block, et al.
Published: (2025) -
Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement
by: Monir, Nasser-Eddine, et al.
Published: (2025)