Guardado en:
| Autores principales: | Liu, Yutong, Zhang, Ziyue, Huang, Cheng, Yu, Yongbin, Wang, Xiangxiang, Cai, Yuqing, Tashi, Nyima |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2509.15095 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
por: Liu, Yutong, et al.
Publicado: (2025)
por: Liu, Yutong, et al.
Publicado: (2025)
A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data
por: Chou, Cheng-Kang, et al.
Publicado: (2025)
por: Chou, Cheng-Kang, et al.
Publicado: (2025)
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
por: Yang, Guanrou, et al.
Publicado: (2024)
por: Yang, Guanrou, et al.
Publicado: (2024)
Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing
por: Wang, Mengqi, et al.
Publicado: (2025)
por: Wang, Mengqi, et al.
Publicado: (2025)
Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping
por: Maghsoudi, Maryam, et al.
Publicado: (2026)
por: Maghsoudi, Maryam, et al.
Publicado: (2026)
Can Speech LLMs Think while Listening?
por: Shih, Yi-Jen, et al.
Publicado: (2025)
por: Shih, Yi-Jen, et al.
Publicado: (2025)
AS-ASR: A Lightweight Framework for Aphasia-Specific Automatic Speech Recognition
por: Bao, Chen, et al.
Publicado: (2025)
por: Bao, Chen, et al.
Publicado: (2025)
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
por: Liu, Rui, et al.
Publicado: (2025)
por: Liu, Rui, et al.
Publicado: (2025)
Speech Recognition on TV Series with Video-guided Post-ASR Correction
por: Yang, Haoyuan, et al.
Publicado: (2025)
por: Yang, Haoyuan, et al.
Publicado: (2025)
Progressive Residual Extraction based Pre-training for Speech Representation Learning
por: Wang, Tianrui, et al.
Publicado: (2024)
por: Wang, Tianrui, et al.
Publicado: (2024)
Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM
por: Zhang, Fengrun, et al.
Publicado: (2024)
por: Zhang, Fengrun, et al.
Publicado: (2024)
Transferable Adversarial Attacks against ASR
por: Gao, Xiaoxue, et al.
Publicado: (2024)
por: Gao, Xiaoxue, et al.
Publicado: (2024)
Tracking Listener Attention: Gaze-Guided Audio-Visual Speech Enhancement Framework
por: Yang, Hsiang-Cheng, et al.
Publicado: (2026)
por: Yang, Hsiang-Cheng, et al.
Publicado: (2026)
A Convolutional Framework for Mapping Imagined Auditory MEG into Listened Brain Responses
por: Maghsoudi, Maryam, et al.
Publicado: (2025)
por: Maghsoudi, Maryam, et al.
Publicado: (2025)
VIBEVOICE-ASR Technical Report
por: Peng, Zhiliang, et al.
Publicado: (2026)
por: Peng, Zhiliang, et al.
Publicado: (2026)
Advancing Multi-talker ASR Performance with Large Language Models
por: Shi, Mohan, et al.
Publicado: (2024)
por: Shi, Mohan, et al.
Publicado: (2024)
Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches
por: Aboeitta, Ahmed, et al.
Publicado: (2025)
por: Aboeitta, Ahmed, et al.
Publicado: (2025)
DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models
por: Li, Li, et al.
Publicado: (2026)
por: Li, Li, et al.
Publicado: (2026)
FlanEC: Exploring Flan-T5 for Post-ASR Error Correction
por: La Quatra, Moreno, et al.
Publicado: (2025)
por: La Quatra, Moreno, et al.
Publicado: (2025)
Cross-modal Knowledge Transfer Learning as Graph Matching Based on Optimal Transport for ASR
por: Lu, Xugang, et al.
Publicado: (2025)
por: Lu, Xugang, et al.
Publicado: (2025)
Retrieval-Augmented Self-Taught Reasoning Model with Adaptive Chain-of-Thought for ASR Named Entity Correction
por: An, Junjie, et al.
Publicado: (2026)
por: An, Junjie, et al.
Publicado: (2026)
Better Pseudo-labeling with Multi-ASR Fusion and Error Correction by SpeechLLM
por: Prakash, Jeena, et al.
Publicado: (2025)
por: Prakash, Jeena, et al.
Publicado: (2025)
OCR-Enhanced Multimodal ASR Can Read While Listening
por: Chen, Junli, et al.
Publicado: (2026)
por: Chen, Junli, et al.
Publicado: (2026)
Training-Free Intelligibility-Guided Observation Addition for Noisy ASR
por: Li, Haoyang, et al.
Publicado: (2026)
por: Li, Haoyang, et al.
Publicado: (2026)
Leveraging Multiple Speech Enhancers for Non-Intrusive Intelligibility Prediction for Hearing-Impaired Listeners
por: Cao, Boxuan, et al.
Publicado: (2025)
por: Cao, Boxuan, et al.
Publicado: (2025)
Enhanced Generative Machine Listener
por: Raj, Vishnu, et al.
Publicado: (2025)
por: Raj, Vishnu, et al.
Publicado: (2025)
Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR
por: Shao, Mingchen, et al.
Publicado: (2025)
por: Shao, Mingchen, et al.
Publicado: (2025)
Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition
por: Hu, Shujie, et al.
Publicado: (2024)
por: Hu, Shujie, et al.
Publicado: (2024)
The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?
por: Billa, Jayadev
Publicado: (2026)
por: Billa, Jayadev
Publicado: (2026)
Unifying Streaming and Non-streaming Zipformer-based ASR
por: Sharma, Bidisha, et al.
Publicado: (2025)
por: Sharma, Bidisha, et al.
Publicado: (2025)
LTS-VoiceAgent: A Listen-Think-Speak Framework for Efficient Streaming Voice Interaction via Semantic Triggering and Incremental Reasoning
por: Zou, Wenhao, et al.
Publicado: (2026)
por: Zou, Wenhao, et al.
Publicado: (2026)
Leveraging ASR Pretrained Conformers for Speaker Verification through Transfer Learning and Knowledge Distillation
por: Cai, Danwei, et al.
Publicado: (2023)
por: Cai, Danwei, et al.
Publicado: (2023)
CHSER: A Dataset and Case Study on Generative Speech Error Correction for Child ASR
por: Shankar, Natarajan Balaji, et al.
Publicado: (2025)
por: Shankar, Natarajan Balaji, et al.
Publicado: (2025)
Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
por: Fang, Yangui, et al.
Publicado: (2025)
por: Fang, Yangui, et al.
Publicado: (2025)
Explore the Reinforcement Learning for the LLM based ASR and TTS system
por: Gao, Changfeng, et al.
Publicado: (2025)
por: Gao, Changfeng, et al.
Publicado: (2025)
Investigation of Whisper ASR Hallucinations Induced by Non-Speech Audio
por: Barański, Mateusz, et al.
Publicado: (2025)
por: Barański, Mateusz, et al.
Publicado: (2025)
VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining
por: Zhuo, Jianheng, et al.
Publicado: (2025)
por: Zhuo, Jianheng, et al.
Publicado: (2025)
Granite-speech: open-source speech-aware LLMs with strong English ASR capabilities
por: Saon, George, et al.
Publicado: (2025)
por: Saon, George, et al.
Publicado: (2025)
Listen, Think, and Understand
por: Gong, Yuan, et al.
Publicado: (2023)
por: Gong, Yuan, et al.
Publicado: (2023)
UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models
por: Guan, Wenhao, et al.
Publicado: (2025)
por: Guan, Wenhao, et al.
Publicado: (2025)
Ejemplares similares
-
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
por: Liu, Yutong, et al.
Publicado: (2025) -
A Self-Refining Framework for Enhancing ASR Using TTS-Synthesized Data
por: Chou, Cheng-Kang, et al.
Publicado: (2025) -
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
por: Yang, Guanrou, et al.
Publicado: (2024) -
Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing
por: Wang, Mengqi, et al.
Publicado: (2025) -
Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping
por: Maghsoudi, Maryam, et al.
Publicado: (2026)