:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kapu, Nirmal Joshua, Karan, Raghav
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence Computation and Language Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2411.18636
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation
by: Zhang, Yuhao, et al.
Published: (2025)

Recent Advances in End-to-End Simultaneous Speech Translation
by: Liu, Xiaoqian, et al.
Published: (2024)

Luganda Speech Intent Recognition for IoT Applications
by: Katumba, Andrew, et al.
Published: (2024)

Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE Dataset
by: Liu, Rui, et al.
Published: (2025)

Towards Robust Speech Representation Learning for Thousands of Languages
by: Chen, William, et al.
Published: (2024)

Amplifying Emotional Signals: Data-Efficient Deep Learning for Robust Speech Emotion Recognition
by: Vu, Tai
Published: (2025)

Towards Signal Processing In Large Language Models
by: Verma, Prateek, et al.
Published: (2024)

WavLLM: Towards Robust and Adaptive Speech Large Language Model
by: Hu, Shujie, et al.
Published: (2024)

TSPC: A Two-Stage Phoneme-Centric Architecture for code-switching Vietnamese-English Speech Recognition
by: Anh, Tran Nguyen, et al.
Published: (2025)

Efficient Dialect-Aware Modeling and Conditioning for Low-Resource Taiwanese Hakka Speech Processing
by: Peng, An-Ci, et al.
Published: (2026)

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
by: Qiang, Chunyu, et al.
Published: (2024)

OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
by: Wang, Chen, et al.
Published: (2025)

Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets
by: Gedeon, Máté, et al.
Published: (2025)

Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing
by: Liu, Tianchi, et al.
Published: (2024)

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
by: Zhang, Shaolei, et al.
Published: (2024)

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation
by: Srivastav, Vaibhav, et al.
Published: (2025)

Bimodal Connection Attention Fusion for Speech Emotion Recognition
by: Luo, Jiachen, et al.
Published: (2025)

SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought
by: Gong, Hongyu, et al.
Published: (2024)

Toward Responsible ASR for African American English Speakers: A Scoping Review of Bias and Equity in Speech Technology
by: Cunningham, Jay L., et al.
Published: (2025)

Seamless Dysfluent Speech Text Alignment for Disordered Speech Analysis
by: Ye, Zongli, et al.
Published: (2025)

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
by: Wu, Yihan, et al.
Published: (2024)

Adapting Foundation Speech Recognition Models to Impaired Speech: A Semantic Re-chaining Approach for Personalization of German Speech
by: Pokel, Niclas, et al.
Published: (2025)

SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval
by: Lin, Yueqian, et al.
Published: (2024)

A Unified Speech LLM for Diarization and Speech Recognition in Multilingual Conversations
by: Saengthong, Phurich, et al.
Published: (2025)

Self-supervised Speech Models for Word-Level Stuttered Speech Detection
by: Shih, Yi-Jen, et al.
Published: (2024)

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
by: Le, Chenyang, et al.
Published: (2024)

From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models
by: Ersoy, Asım, et al.
Published: (2025)

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
by: Ma, Ziyang, et al.
Published: (2023)

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
by: Wang, Peidong, et al.
Published: (2025)

Advancing Speech Understanding in Speech-Aware Language Models with GRPO
by: Elmakies, Avishai, et al.
Published: (2025)

A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation
by: Ma, Zhengrui, et al.
Published: (2024)

Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment
by: Azad, Asif, et al.
Published: (2026)

SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models
by: Yao, Wenhan, et al.
Published: (2025)

QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions
by: Wang, Siyin, et al.
Published: (2025)

Exploring ASR-Based Wav2Vec2 for Automated Speech Disorder Assessment: Insights and Analysis
by: Nguyen, Tuan, et al.
Published: (2024)

Task-Lens: Cross-Task Utility Based Speech Dataset Profiling for Low-Resource Indian Languages
by: Sharma, Swati, et al.
Published: (2026)

Zero-Shot Speech LLMs for Multi-Aspect Evaluation of L2 Speech: Challenges and Opportunities
by: Parikh, Aditya Kamlesh, et al.
Published: (2026)

MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation
by: Chen, Szu-Chi, et al.
Published: (2026)

Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent for Personalized and Supervised Therapy
by: Sheikh, Shakeel, et al.
Published: (2026)

Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time
by: Seide, Frank, et al.
Published: (2024)