Enregistré dans:
| Auteurs principaux: | Yan, Bi-Cheng, Tsai, Ming-Kang, Chen, Berlin |
|---|---|
| Format: | Preprint |
| Publié: |
2025
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2510.04956 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
Documents similaires
Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment
par: Lo, Tien-Hong, et autres
Publié: (2024)
par: Lo, Tien-Hong, et autres
Publié: (2024)
Towards Efficient and Multifaceted Computer-assisted Pronunciation Training Leveraging Hierarchical Selective State Space Model and Decoupled Cross-entropy Loss
par: Chao, Fu-An, et autres
Publié: (2025)
par: Chao, Fu-An, et autres
Publié: (2025)
Probing the Hidden Talent of ASR Foundation Models for L2 English Oral Assessment
par: Chao, Fu-An, et autres
Publié: (2025)
par: Chao, Fu-An, et autres
Publié: (2025)
HiPPO: Exploring A Novel Hierarchical Pronunciation Assessment Approach for Spoken Languages
par: Yan, Bi-Cheng, et autres
Publié: (2025)
par: Yan, Bi-Cheng, et autres
Publié: (2025)
ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization
par: Yan, Bi-Cheng, et autres
Publié: (2024)
par: Yan, Bi-Cheng, et autres
Publié: (2024)
K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function
par: Li, Shuhe, et autres
Publié: (2025)
par: Li, Shuhe, et autres
Publié: (2025)
JCAPT: A Joint Modeling Approach for CAPT
par: Yang, Tzu-Hsuan, et autres
Publié: (2025)
par: Yang, Tzu-Hsuan, et autres
Publié: (2025)
Segmentation-free Goodness of Pronunciation
par: Cao, Xinwei, et autres
Publié: (2025)
par: Cao, Xinwei, et autres
Publié: (2025)
An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition
par: Wang, Yi-Cheng, et autres
Publié: (2024)
par: Wang, Yi-Cheng, et autres
Publié: (2024)
Pronunciation Deviation Analysis Through Voice Cloning and Acoustic Comparison
par: Valdivia, Andrew, et autres
Publié: (2025)
par: Valdivia, Andrew, et autres
Publié: (2025)
MuSpike: A Benchmark and Evaluation Framework for Symbolic Music Generation with Spiking Neural Networks
par: Liang, Qian, et autres
Publié: (2025)
par: Liang, Qian, et autres
Publié: (2025)
SonoEdit: Null-Space Constrained Knowledge Editing for Pronunciation Correction in LLM-Based TTS
par: Singh, Ayush Pratap, et autres
Publié: (2026)
par: Singh, Ayush Pratap, et autres
Publié: (2026)
Optimizing Automatic Speech Assessment: W-RankSim Regularization and Hybrid Feature Fusion Strategies
par: Wu, Chung-Wen, et autres
Publié: (2024)
par: Wu, Chung-Wen, et autres
Publié: (2024)
AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
par: Li, Kai, et autres
Publié: (2025)
par: Li, Kai, et autres
Publié: (2025)
Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS
par: Nguyen, Tuan Nam, et autres
Publié: (2024)
par: Nguyen, Tuan Nam, et autres
Publié: (2024)
Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment
par: Choi, Kwanghee, et autres
Publié: (2025)
par: Choi, Kwanghee, et autres
Publié: (2025)
MuPT: A Generative Symbolic Music Pretrained Transformer
par: Qu, Xingwei, et autres
Publié: (2024)
par: Qu, Xingwei, et autres
Publié: (2024)
Exploring State-Space-Model based Language Model in Music Generation
par: Lee, Wei-Jaw, et autres
Publié: (2025)
par: Lee, Wei-Jaw, et autres
Publié: (2025)
An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution
par: Lo, Tien-Hong, et autres
Publié: (2024)
par: Lo, Tien-Hong, et autres
Publié: (2024)
Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition
par: Wang, Chien-Chun, et autres
Publié: (2024)
par: Wang, Chien-Chun, et autres
Publié: (2024)
ALICE: A Multifaceted Evaluation Framework of Large Audio-Language Models' In-Context Learning Ability
par: Piao, Yen-Ting, et autres
Publié: (2026)
par: Piao, Yen-Ting, et autres
Publié: (2026)
Acoustic Feature Mixup for Balanced Multi-aspect Pronunciation Assessment
par: Do, Heejin, et autres
Publié: (2024)
par: Do, Heejin, et autres
Publié: (2024)
Pronunciation-Lexicon Free Training for Phoneme-based Crosslingual ASR via Joint Stochastic Approximation
par: Yusuyin, Saierdaer, et autres
Publié: (2025)
par: Yusuyin, Saierdaer, et autres
Publié: (2025)
Fine-Tuning Text-to-Speech Diffusion Models Using Reinforcement Learning with Human Feedback
par: Chen, Jingyi, et autres
Publié: (2025)
par: Chen, Jingyi, et autres
Publié: (2025)
Robust Generative Audio Quality Assessment: Disentangling Quality from Spurious Correlations
par: Huang, Kuan-Tang, et autres
Publié: (2026)
par: Huang, Kuan-Tang, et autres
Publié: (2026)
Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing
par: Peng, An-Ci, et autres
Publié: (2026)
par: Peng, An-Ci, et autres
Publié: (2026)
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
par: Hong, Yeona, et autres
Publié: (2025)
par: Hong, Yeona, et autres
Publié: (2025)
An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement
par: Yang, Tzu-Ting, et autres
Publié: (2024)
par: Yang, Tzu-Ting, et autres
Publié: (2024)
VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models
par: Wang, Yuxiang, et autres
Publié: (2026)
par: Wang, Yuxiang, et autres
Publié: (2026)
CrossMuSim: A Cross-Modal Framework for Music Similarity Retrieval with LLM-Powered Text Description Sourcing and Mining
par: Tsoi, Tristan, et autres
Publié: (2025)
par: Tsoi, Tristan, et autres
Publié: (2025)
Quantizer-Aware Hierarchical Neural Codec Modeling for Speech Deepfake Detection
par: Wu, Jinyang, et autres
Publié: (2026)
par: Wu, Jinyang, et autres
Publié: (2026)
Direction of Arrival Correction through Speech Quality Feedback
par: Rascon, Caleb
Publié: (2024)
par: Rascon, Caleb
Publié: (2024)
Diff-V2M: A Hierarchical Conditional Diffusion Model with Explicit Rhythmic Modeling for Video-to-Music Generation
par: Ji, Shulei, et autres
Publié: (2025)
par: Ji, Shulei, et autres
Publié: (2025)
Towards Unified Music Emotion Recognition across Dimensional and Categorical Models
par: Kang, Jaeyong, et autres
Publié: (2025)
par: Kang, Jaeyong, et autres
Publié: (2025)
GeHirNet: A Gender-Aware Hierarchical Model for Voice Pathology Classification
par: Wu, Fan, et autres
Publié: (2025)
par: Wu, Fan, et autres
Publié: (2025)
GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions
par: Zuo, Heda, et autres
Publié: (2025)
par: Zuo, Heda, et autres
Publié: (2025)
TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition
par: Yang, Cheng-Yeh, et autres
Publié: (2026)
par: Yang, Cheng-Yeh, et autres
Publié: (2026)
Text-Queried Audio Source Separation via Hierarchical Modeling
par: Yin, Xinlei, et autres
Publié: (2025)
par: Yin, Xinlei, et autres
Publié: (2025)
Are We There Yet? A Brief Survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges
par: Kang, Jaeyong, et autres
Publié: (2024)
par: Kang, Jaeyong, et autres
Publié: (2024)
Stream-Voice-Anon: Enhancing Utility of Real-Time Speaker Anonymization via Neural Audio Codec and Language Models
par: Kuzmin, Nikita, et autres
Publié: (2026)
par: Kuzmin, Nikita, et autres
Publié: (2026)
Documents similaires
-
Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment
par: Lo, Tien-Hong, et autres
Publié: (2024) -
Towards Efficient and Multifaceted Computer-assisted Pronunciation Training Leveraging Hierarchical Selective State Space Model and Decoupled Cross-entropy Loss
par: Chao, Fu-An, et autres
Publié: (2025) -
Probing the Hidden Talent of ASR Foundation Models for L2 English Oral Assessment
par: Chao, Fu-An, et autres
Publié: (2025) -
HiPPO: Exploring A Novel Hierarchical Pronunciation Assessment Approach for Spoken Languages
par: Yan, Bi-Cheng, et autres
Publié: (2025) -
ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization
par: Yan, Bi-Cheng, et autres
Publié: (2024)