Saved in:
| Main Authors: | Liu, Wei, Li, Jiahong, Shao, Yiwen, Yu, Dong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.14410 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AzeroS: Extending LLM to Speech with Self-Generated Instruction-Free Tuning
by: Shao, Yiwen, et al.
Published: (2025)
by: Shao, Yiwen, et al.
Published: (2025)
Transcribing and Translating, Fast and Slow: Joint Speech Translation and Recognition
by: Moritz, Niko, et al.
Published: (2024)
by: Moritz, Niko, et al.
Published: (2024)
WhisperVC: Decoupled Cross-Domain Alignment and Speech Generation for Low-Resource Whisper-to-Normal Conversion
by: Liu, Dong, et al.
Published: (2025)
by: Liu, Dong, et al.
Published: (2025)
Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding
by: Huo, Mingyue, et al.
Published: (2025)
by: Huo, Mingyue, et al.
Published: (2025)
TOGGL: Transcribing Overlapping Speech with Staggered Labeling
by: Li, Chak-Fai, et al.
Published: (2024)
by: Li, Chak-Fai, et al.
Published: (2024)
Adapting Self-Supervised Speech Representations for Cross-lingual Dysarthria Detection in Parkinson's Disease
by: Hernandez, Abner, et al.
Published: (2026)
by: Hernandez, Abner, et al.
Published: (2026)
RIR-SF: Room Impulse Response Based Spatial Feature for Target Speech Recognition in Multi-Channel Multi-Speaker Scenarios
by: Shao, Yiwen, et al.
Published: (2023)
by: Shao, Yiwen, et al.
Published: (2023)
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data
by: Saeki, Takaaki, et al.
Published: (2024)
by: Saeki, Takaaki, et al.
Published: (2024)
SoulX-Transcriber: A Robust End-to-End Framework for Multi-Speaker Speech Transcription
by: Dai, Yuhang, et al.
Published: (2026)
by: Dai, Yuhang, et al.
Published: (2026)
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
by: Kim, Ji-Hoon, et al.
Published: (2024)
by: Kim, Ji-Hoon, et al.
Published: (2024)
Efficient Multilingual ASR Finetuning via LoRA Language Experts
by: Li, Jiahong, et al.
Published: (2025)
by: Li, Jiahong, et al.
Published: (2025)
Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition
by: Yang, Zhengdong, et al.
Published: (2025)
by: Yang, Zhengdong, et al.
Published: (2025)
Cross-lingual Data Selection Using Clip-level Acoustic Similarity for Enhancing Low-resource Automatic Speech Recognition
by: Mitsumori, Shunsuke, et al.
Published: (2025)
by: Mitsumori, Shunsuke, et al.
Published: (2025)
Revisiting Audio-language Pretraining for Learning General-purpose Audio Representation
by: Tseng, Wei-Cheng, et al.
Published: (2025)
by: Tseng, Wei-Cheng, et al.
Published: (2025)
Efficient Scaling for LLM-based ASR
by: Mu, Bingshen, et al.
Published: (2025)
by: Mu, Bingshen, et al.
Published: (2025)
TASU: Text-Only Alignment for Speech Understanding
by: Peng, Jing, et al.
Published: (2025)
by: Peng, Jing, et al.
Published: (2025)
Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis
by: Niu, Zhikang, et al.
Published: (2025)
by: Niu, Zhikang, et al.
Published: (2025)
SiamCTC: Learning Speech Representations through Monotonic Temporal Alignment
by: Eom, SooHwan, et al.
Published: (2026)
by: Eom, SooHwan, et al.
Published: (2026)
Cross-lingual Alzheimer's Disease detection based on paralinguistic and pre-trained features
by: Chen, Xuchu, et al.
Published: (2023)
by: Chen, Xuchu, et al.
Published: (2023)
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
by: Chen, Guoguo, et al.
Published: (2021)
by: Chen, Guoguo, et al.
Published: (2021)
Transcribe, Align and Segment: Creating speech datasets for low-resource languages
by: Sereda, Taras
Published: (2024)
by: Sereda, Taras
Published: (2024)
Zero-shot Cross-lingual Voice Transfer for TTS
by: Biadsy, Fadi, et al.
Published: (2024)
by: Biadsy, Fadi, et al.
Published: (2024)
TagSpeech: End-to-End Multi-Speaker ASR and Diarization with Fine-Grained Temporal Grounding
by: Huo, Mingyue, et al.
Published: (2026)
by: Huo, Mingyue, et al.
Published: (2026)
TTA-Bench: A Comprehensive Benchmark for Evaluating Text-to-Audio Models
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition
by: Yoon, Eunseop, et al.
Published: (2024)
by: Yoon, Eunseop, et al.
Published: (2024)
Adaptive Inner Speech-Text Alignment for LLM-based Speech Translation
by: Liu, Henglyu, et al.
Published: (2025)
by: Liu, Henglyu, et al.
Published: (2025)
MOSS Transcribe Diarize Technical Report
by: AI, MOSI., et al.
Published: (2026)
by: AI, MOSI., et al.
Published: (2026)
Learning Time-Graph Frequency Representation for Monaural Speech Enhancement
by: Wang, Tingting, et al.
Published: (2025)
by: Wang, Tingting, et al.
Published: (2025)
DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models
by: Wang, Yuanyuan, et al.
Published: (2025)
by: Wang, Yuanyuan, et al.
Published: (2025)
Speech Recognition Transformers: Topological-lingualism Perspective
by: Singh, Shruti, et al.
Published: (2024)
by: Singh, Shruti, et al.
Published: (2024)
SECodec: Structural Entropy-based Compressive Speech Representation Codec for Speech Language Models
by: Wang, Linqin, et al.
Published: (2024)
by: Wang, Linqin, et al.
Published: (2024)
Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning
by: Sun, Siqi, et al.
Published: (2024)
by: Sun, Siqi, et al.
Published: (2024)
Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens
by: Zhao, Jinzheng, et al.
Published: (2024)
by: Zhao, Jinzheng, et al.
Published: (2024)
Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
by: Shao, Yiwen, et al.
Published: (2024)
by: Shao, Yiwen, et al.
Published: (2024)
Unlocking Strong Supervision: A Data-Centric Study of General-Purpose Audio Pre-Training Methods
by: Zhou, Xuanru, et al.
Published: (2026)
by: Zhou, Xuanru, et al.
Published: (2026)
MUSA: Multi-lingual Speaker Anonymization via Serial Disentanglement
by: Yao, Jixun, et al.
Published: (2024)
by: Yao, Jixun, et al.
Published: (2024)
MSR-Codec: A Low-Bitrate Multi-Stream Residual Codec for High-Fidelity Speech Generation with Information Disentanglement
by: Li, Jingyu, et al.
Published: (2025)
by: Li, Jingyu, et al.
Published: (2025)
Entropy-based Coarse and Compressed Semantic Speech Representation Learning
by: Zuo, Jialong, et al.
Published: (2025)
by: Zuo, Jialong, et al.
Published: (2025)
Exploring Cross-Utterance Speech Contexts for Conformer-Transducer Speech Recognition Systems
by: Cui, Mingyu, et al.
Published: (2025)
by: Cui, Mingyu, et al.
Published: (2025)
Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-Speech
by: Kim, Youngjae, et al.
Published: (2024)
by: Kim, Youngjae, et al.
Published: (2024)
Similar Items
-
AzeroS: Extending LLM to Speech with Self-Generated Instruction-Free Tuning
by: Shao, Yiwen, et al.
Published: (2025) -
Transcribing and Translating, Fast and Slow: Joint Speech Translation and Recognition
by: Moritz, Niko, et al.
Published: (2024) -
WhisperVC: Decoupled Cross-Domain Alignment and Speech Generation for Low-Resource Whisper-to-Normal Conversion
by: Liu, Dong, et al.
Published: (2025) -
Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding
by: Huo, Mingyue, et al.
Published: (2025) -
TOGGL: Transcribing Overlapping Speech with Staggered Labeling
by: Li, Chak-Fai, et al.
Published: (2024)