Saved in:
| Main Authors: | Kapu, Nirmal Joshua, Karan, Raghav |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.18636 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation
by: Zhang, Yuhao, et al.
Published: (2025)
by: Zhang, Yuhao, et al.
Published: (2025)
Recent Advances in End-to-End Simultaneous Speech Translation
by: Liu, Xiaoqian, et al.
Published: (2024)
by: Liu, Xiaoqian, et al.
Published: (2024)
Luganda Speech Intent Recognition for IoT Applications
by: Katumba, Andrew, et al.
Published: (2024)
by: Katumba, Andrew, et al.
Published: (2024)
Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE Dataset
by: Liu, Rui, et al.
Published: (2025)
by: Liu, Rui, et al.
Published: (2025)
Towards Robust Speech Representation Learning for Thousands of Languages
by: Chen, William, et al.
Published: (2024)
by: Chen, William, et al.
Published: (2024)
Amplifying Emotional Signals: Data-Efficient Deep Learning for Robust Speech Emotion Recognition
by: Vu, Tai
Published: (2025)
by: Vu, Tai
Published: (2025)
Towards Signal Processing In Large Language Models
by: Verma, Prateek, et al.
Published: (2024)
by: Verma, Prateek, et al.
Published: (2024)
WavLLM: Towards Robust and Adaptive Speech Large Language Model
by: Hu, Shujie, et al.
Published: (2024)
by: Hu, Shujie, et al.
Published: (2024)
TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition
by: Anh, Tran Nguyen, et al.
Published: (2025)
by: Anh, Tran Nguyen, et al.
Published: (2025)
Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing
by: Peng, An-Ci, et al.
Published: (2026)
by: Peng, An-Ci, et al.
Published: (2026)
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
by: Qiang, Chunyu, et al.
Published: (2024)
by: Qiang, Chunyu, et al.
Published: (2024)
OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
by: Wang, Chen, et al.
Published: (2025)
by: Wang, Chen, et al.
Published: (2025)
Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets
by: Gedeon, Máté, et al.
Published: (2025)
by: Gedeon, Máté, et al.
Published: (2025)
Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing
by: Liu, Tianchi, et al.
Published: (2024)
by: Liu, Tianchi, et al.
Published: (2024)
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
by: Zhang, Shaolei, et al.
Published: (2024)
by: Zhang, Shaolei, et al.
Published: (2024)
Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation
by: Srivastav, Vaibhav, et al.
Published: (2025)
by: Srivastav, Vaibhav, et al.
Published: (2025)
Bimodal Connection Attention Fusion for Speech Emotion Recognition
by: Luo, Jiachen, et al.
Published: (2025)
by: Luo, Jiachen, et al.
Published: (2025)
SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought
by: Gong, Hongyu, et al.
Published: (2024)
by: Gong, Hongyu, et al.
Published: (2024)
Toward Responsible ASR for African American English Speakers: A Scoping Review of Bias and Equity in Speech Technology
by: Cunningham, Jay L., et al.
Published: (2025)
by: Cunningham, Jay L., et al.
Published: (2025)
Seamless Dysfluent Speech Text Alignment for Disordered Speech Analysis
by: Ye, Zongli, et al.
Published: (2025)
by: Ye, Zongli, et al.
Published: (2025)
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
by: Wu, Yihan, et al.
Published: (2024)
by: Wu, Yihan, et al.
Published: (2024)
Adapting Foundation Speech Recognition Models to Impaired Speech: A Semantic Re-chaining Approach for Personalization of German Speech
by: Pokel, Niclas, et al.
Published: (2025)
by: Pokel, Niclas, et al.
Published: (2025)
SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval
by: Lin, Yueqian, et al.
Published: (2024)
by: Lin, Yueqian, et al.
Published: (2024)
A Unified Speech LLM for Diarization and Speech Recognition in Multilingual Conversations
by: Saengthong, Phurich, et al.
Published: (2025)
by: Saengthong, Phurich, et al.
Published: (2025)
Self-supervised Speech Models for Word-Level Stuttered Speech Detection
by: Shih, Yi-Jen, et al.
Published: (2024)
by: Shih, Yi-Jen, et al.
Published: (2024)
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
by: Le, Chenyang, et al.
Published: (2024)
by: Le, Chenyang, et al.
Published: (2024)
From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models
by: Ersoy, Asım, et al.
Published: (2025)
by: Ersoy, Asım, et al.
Published: (2025)
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
by: Ma, Ziyang, et al.
Published: (2023)
by: Ma, Ziyang, et al.
Published: (2023)
Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
by: Wang, Peidong, et al.
Published: (2025)
by: Wang, Peidong, et al.
Published: (2025)
Advancing Speech Understanding in Speech-Aware Language Models with GRPO
by: Elmakies, Avishai, et al.
Published: (2025)
by: Elmakies, Avishai, et al.
Published: (2025)
A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation
by: Ma, Zhengrui, et al.
Published: (2024)
by: Ma, Zhengrui, et al.
Published: (2024)
Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment
by: Azad, Asif, et al.
Published: (2026)
by: Azad, Asif, et al.
Published: (2026)
SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models
by: Yao, Wenhan, et al.
Published: (2025)
by: Yao, Wenhan, et al.
Published: (2025)
QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
by: Wang, Siyin, et al.
Published: (2025)
by: Wang, Siyin, et al.
Published: (2025)
Exploring ASR-Based Wav2Vec2 for Automated Speech Disorder Assessment: Insights and Analysis
by: Nguyen, Tuan, et al.
Published: (2024)
by: Nguyen, Tuan, et al.
Published: (2024)
Task-Lens: Cross-Task Utility Based Speech Dataset Profiling for Low-Resource Indian Languages
by: Sharma, Swati, et al.
Published: (2026)
by: Sharma, Swati, et al.
Published: (2026)
Zero-Shot Speech LLMs for Multi-Aspect Evaluation of L2 Speech: Challenges and Opportunities
by: Parikh, Aditya Kamlesh, et al.
Published: (2026)
by: Parikh, Aditya Kamlesh, et al.
Published: (2026)
MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation
by: Chen, Szu-Chi, et al.
Published: (2026)
by: Chen, Szu-Chi, et al.
Published: (2026)
Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent for Personalized and Supervised Therapy
by: Sheikh, Shakeel, et al.
Published: (2026)
by: Sheikh, Shakeel, et al.
Published: (2026)
Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time
by: Seide, Frank, et al.
Published: (2024)
by: Seide, Frank, et al.
Published: (2024)
Similar Items
-
Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation
by: Zhang, Yuhao, et al.
Published: (2025) -
Recent Advances in End-to-End Simultaneous Speech Translation
by: Liu, Xiaoqian, et al.
Published: (2024) -
Luganda Speech Intent Recognition for IoT Applications
by: Katumba, Andrew, et al.
Published: (2024) -
Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE Dataset
by: Liu, Rui, et al.
Published: (2025) -
Towards Robust Speech Representation Learning for Thousands of Languages
by: Chen, William, et al.
Published: (2024)