Guardado en:
| Autores principales: | Wang, Chien-Chun, Yu, En-Lun, Hung, Jeih-Weih, Huang, Shih-Chieh, Chen, Berlin |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2508.20885 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
What do neural networks listen to? Exploring the crucial bands in Speech Enhancement using Sinc-convolution
por: Ho, Kuan-Hsun, et al.
Publicado: (2024)
por: Ho, Kuan-Hsun, et al.
Publicado: (2024)
ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning
por: Ho, Kuan-Hsun, et al.
Publicado: (2024)
por: Ho, Kuan-Hsun, et al.
Publicado: (2024)
Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation
por: Wang, Chien-Chun, et al.
Publicado: (2024)
por: Wang, Chien-Chun, et al.
Publicado: (2024)
MALEFA: Multi-grAnularity Learning and Effective False Alarm Suppression for Zero-shot Keyword Spotting
por: Li, Lo-Ya, et al.
Publicado: (2026)
por: Li, Lo-Ya, et al.
Publicado: (2026)
QAMRO: Quality-aware Adaptive Margin Ranking Optimization for Human-aligned Assessment of Audio Generation Systems
por: Wang, Chien-Chun, et al.
Publicado: (2025)
por: Wang, Chien-Chun, et al.
Publicado: (2025)
Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
por: Wang, Chien-Chun, et al.
Publicado: (2026)
por: Wang, Chien-Chun, et al.
Publicado: (2026)
Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition
por: Wang, Chien-Chun, et al.
Publicado: (2024)
por: Wang, Chien-Chun, et al.
Publicado: (2024)
LibriVAD: A Scalable Open Dataset with Deep Learning Benchmarks for Voice Activity Detection
por: Stylianou, Ioannis, et al.
Publicado: (2025)
por: Stylianou, Ioannis, et al.
Publicado: (2025)
sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks
por: Yang, Qu, et al.
Publicado: (2024)
por: Yang, Qu, et al.
Publicado: (2024)
Advancing Automated Speaking Assessment Leveraging Multifaceted Relevance and Grammar Information
por: Lu, Hao-Chien, et al.
Publicado: (2025)
por: Lu, Hao-Chien, et al.
Publicado: (2025)
Robust Generative Audio Quality Assessment: Disentangling Quality from Spurious Correlations
por: Huang, Kuan-Tang, et al.
Publicado: (2026)
por: Huang, Kuan-Tang, et al.
Publicado: (2026)
DRASP: A Dual-Resolution Attentive Statistics Pooling Framework for Automatic MOS Prediction
por: Yang, Cheng-Yeh, et al.
Publicado: (2025)
por: Yang, Cheng-Yeh, et al.
Publicado: (2025)
A Holistic Framework for Robust Bangla ASR and Speaker Diarization with Optimized VAD and CTC Alignment
por: Ishmam, Zarif, et al.
Publicado: (2026)
por: Ishmam, Zarif, et al.
Publicado: (2026)
Optimizing Automatic Speech Assessment: W-RankSim Regularization and Hybrid Feature Fusion Strategies
por: Wu, Chung-Wen, et al.
Publicado: (2024)
por: Wu, Chung-Wen, et al.
Publicado: (2024)
CLiFT-ASR: A Cross-Lingual Fine-Tuning Framework for Low-Resource Taiwanese Hokkien Speech Recognition
por: Sung, Hung-Yang, et al.
Publicado: (2025)
por: Sung, Hung-Yang, et al.
Publicado: (2025)
Multimodal Emotion Regression with Multi-Objective Optimization and VAD-Aware Audio Modeling for the 10th ABAW EMI Track
por: Huang, Jiawen, et al.
Publicado: (2026)
por: Huang, Jiawen, et al.
Publicado: (2026)
TG-ASR: Translation-Guided Learning with Parallel Gated Cross Attention for Low-Resource Automatic Speech Recognition
por: Yang, Cheng-Yeh, et al.
Publicado: (2026)
por: Yang, Cheng-Yeh, et al.
Publicado: (2026)
Timed text extraction from Taiwanese Kua-á-hì TV series
por: Huang, Tzu-Hung, et al.
Publicado: (2026)
por: Huang, Tzu-Hung, et al.
Publicado: (2026)
A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions
por: Wang, Chung-Chun, et al.
Publicado: (2025)
por: Wang, Chung-Chun, et al.
Publicado: (2025)
Acoustically Precise Hesitation Tagging Is Essential for End-to-End Verbatim Transcription Systems
por: Lin, Jhen-Ke, et al.
Publicado: (2025)
por: Lin, Jhen-Ke, et al.
Publicado: (2025)
Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining
por: Bovbjerg, Holger Severin, et al.
Publicado: (2025)
por: Bovbjerg, Holger Severin, et al.
Publicado: (2025)
Research on Piano Timbre Transformation System Based on Diffusion Model
por: Hsu, Chun-Chieh, et al.
Publicado: (2026)
por: Hsu, Chun-Chieh, et al.
Publicado: (2026)
Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints
por: Lee, PeiYing, et al.
Publicado: (2024)
por: Lee, PeiYing, et al.
Publicado: (2024)
SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models
por: Yin, Chun, et al.
Publicado: (2024)
por: Yin, Chun, et al.
Publicado: (2024)
Noise-Aware Speech Separation with Contrastive Learning
por: Zhang, Zizheng, et al.
Publicado: (2023)
por: Zhang, Zizheng, et al.
Publicado: (2023)
Spectral-Aware Low-Rank Adaptation for Speaker Verification
por: Li, Zhe, et al.
Publicado: (2025)
por: Li, Zhe, et al.
Publicado: (2025)
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
por: Wang, Shih-heng, et al.
Publicado: (2024)
por: Wang, Shih-heng, et al.
Publicado: (2024)
Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection
por: Mariotte, Théo, et al.
Publicado: (2024)
por: Mariotte, Théo, et al.
Publicado: (2024)
Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection
por: Yin, Han, et al.
Publicado: (2024)
por: Yin, Han, et al.
Publicado: (2024)
InconVAD: A Two-Stage Dual-Tower Framework for Multimodal Emotion Inconsistency Detection
por: Li, Zongyi, et al.
Publicado: (2025)
por: Li, Zongyi, et al.
Publicado: (2025)
Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment
por: Igarashi, Takuto, et al.
Publicado: (2024)
por: Igarashi, Takuto, et al.
Publicado: (2024)
Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning
por: He, Haorui, et al.
Publicado: (2024)
por: He, Haorui, et al.
Publicado: (2024)
Phoenix-VAD: Streaming Semantic Endpoint Detection for Full-Duplex Speech Interaction
por: Wu, Weijie, et al.
Publicado: (2025)
por: Wu, Weijie, et al.
Publicado: (2025)
SceneGuard: Training-Time Voice Protection with Scene-Consistent Audible Background Noise
por: Sang, Rui, et al.
Publicado: (2025)
por: Sang, Rui, et al.
Publicado: (2025)
A Preliminary Exploration with GPT-4o Voice Mode
por: Lin, Yu-Xiang, et al.
Publicado: (2025)
por: Lin, Yu-Xiang, et al.
Publicado: (2025)
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion
por: Ruggiero, Giuseppe, et al.
Publicado: (2024)
por: Ruggiero, Giuseppe, et al.
Publicado: (2024)
Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing
por: Peng, An-Ci, et al.
Publicado: (2026)
por: Peng, An-Ci, et al.
Publicado: (2026)
Improving Underwater Acoustic Classification Through Learnable Gabor Filter Convolution and Attention Mechanisms
por: Domingos, Lucas Cesar Ferreira, et al.
Publicado: (2025)
por: Domingos, Lucas Cesar Ferreira, et al.
Publicado: (2025)
MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation
por: Wu, Shih-Lun, et al.
Publicado: (2025)
por: Wu, Shih-Lun, et al.
Publicado: (2025)
Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling
por: Geng, Chen, et al.
Publicado: (2026)
por: Geng, Chen, et al.
Publicado: (2026)
Ejemplares similares
-
What do neural networks listen to? Exploring the crucial bands in Speech Enhancement using Sinc-convolution
por: Ho, Kuan-Hsun, et al.
Publicado: (2024) -
ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning
por: Ho, Kuan-Hsun, et al.
Publicado: (2024) -
Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation
por: Wang, Chien-Chun, et al.
Publicado: (2024) -
MALEFA: Multi-grAnularity Learning and Effective False Alarm Suppression for Zero-shot Keyword Spotting
por: Li, Lo-Ya, et al.
Publicado: (2026) -
QAMRO: Quality-aware Adaptive Margin Ranking Optimization for Human-aligned Assessment of Audio Generation Systems
por: Wang, Chien-Chun, et al.
Publicado: (2025)