Saved in:
| Main Authors: | Räsänen, Okko, Kocharov, Daniil |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.07700 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ChildGuard: A Specialized Dataset for Combatting Child-Targeted Hate Speech
by: Kashyap, Gautam Siddharth, et al.
Published: (2025)
by: Kashyap, Gautam Siddharth, et al.
Published: (2025)
A model of early word acquisition based on realistic-scale audiovisual naming events
by: Khorrami, Khazar, et al.
Published: (2024)
by: Khorrami, Khazar, et al.
Published: (2024)
Direct Speech to Speech Translation: A Review
by: Sarim, Mohammad, et al.
Published: (2025)
by: Sarim, Mohammad, et al.
Published: (2025)
Pisets: A Robust Speech Recognition System for Lectures and Interviews
by: Bondarenko, Ivan, et al.
Published: (2026)
by: Bondarenko, Ivan, et al.
Published: (2026)
Direct Speech-to-Speech Neural Machine Translation: A Survey
by: Gupta, Mahendra, et al.
Published: (2024)
by: Gupta, Mahendra, et al.
Published: (2024)
Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications
by: Wills, Simone, et al.
Published: (2023)
by: Wills, Simone, et al.
Published: (2023)
SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
by: Hu, Ke, et al.
Published: (2025)
by: Hu, Ke, et al.
Published: (2025)
SpeechAlign: Aligning Speech Generation to Human Preferences
by: Zhang, Dong, et al.
Published: (2024)
by: Zhang, Dong, et al.
Published: (2024)
SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation
by: Zhang, Dong, et al.
Published: (2024)
by: Zhang, Dong, et al.
Published: (2024)
Automatic Speech Recognition for African Low-Resource Languages: Challenges and Future Directions
by: Imam, Sukairaj Hafiz, et al.
Published: (2025)
by: Imam, Sukairaj Hafiz, et al.
Published: (2025)
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024)
by: Cornell, Samuele, et al.
Published: (2024)
Generative Expressive Conversational Speech Synthesis
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
Unified Pathological Speech Analysis with Prompt Tuning
by: Yang, Fei, et al.
Published: (2024)
by: Yang, Fei, et al.
Published: (2024)
Dialectal Coverage And Generalization in Arabic Speech Recognition
by: Djanibekov, Amirbek, et al.
Published: (2024)
by: Djanibekov, Amirbek, et al.
Published: (2024)
Cross-Utterance Conditioned VAE for Speech Generation
by: Li, Yang, et al.
Published: (2023)
by: Li, Yang, et al.
Published: (2023)
SEAL: Speech Embedding Alignment Learning for Speech Large Language Model with Retrieval-Augmented Generation
by: Sun, Chunyu, et al.
Published: (2025)
by: Sun, Chunyu, et al.
Published: (2025)
Scaling Analysis of Interleaved Speech-Text Language Models
by: Maimon, Gallil, et al.
Published: (2025)
by: Maimon, Gallil, et al.
Published: (2025)
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
by: Lou, Haowei, et al.
Published: (2025)
by: Lou, Haowei, et al.
Published: (2025)
Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving
by: Xie, Jingran, et al.
Published: (2025)
by: Xie, Jingran, et al.
Published: (2025)
Long-Form Speech Generation with Spoken Language Models
by: Park, Se Jin, et al.
Published: (2024)
by: Park, Se Jin, et al.
Published: (2024)
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
by: Fucci, Dennis, et al.
Published: (2024)
by: Fucci, Dennis, et al.
Published: (2024)
Rethinking Discrete Speech Representation Tokens for Accent Generation
by: Zhong, Jinzuomu, et al.
Published: (2026)
by: Zhong, Jinzuomu, et al.
Published: (2026)
Examining Test-Time Adaptation for Personalized Child Speech Recognition
by: Shi, Zhonghao, et al.
Published: (2024)
by: Shi, Zhonghao, et al.
Published: (2024)
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders
by: Shi, Hao, et al.
Published: (2023)
by: Shi, Hao, et al.
Published: (2023)
Word Level Timestamp Generation for Automatic Speech Recognition and Translation
by: Hu, Ke, et al.
Published: (2025)
by: Hu, Ke, et al.
Published: (2025)
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
by: Zhang, Xin, et al.
Published: (2023)
by: Zhang, Xin, et al.
Published: (2023)
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
by: Futami, Hayato, et al.
Published: (2025)
by: Futami, Hayato, et al.
Published: (2025)
Improving Speech-based Emotion Recognition with Contextual Utterance Analysis and LLMs
by: Zhang, Enshi, et al.
Published: (2024)
by: Zhang, Enshi, et al.
Published: (2024)
Analysis of Speech Temporal Dynamics in the Context of Speaker Verification and Voice Anonymization
by: Tomashenko, Natalia, et al.
Published: (2024)
by: Tomashenko, Natalia, et al.
Published: (2024)
Graph Modelling Analysis of Speech-Gesture Interaction for Aphasia Severity Estimation
by: Kollapally, Navya Martin, et al.
Published: (2026)
by: Kollapally, Navya Martin, et al.
Published: (2026)
Continuous Speech Tokenizer in Text To Speech
by: Li, Yixing, et al.
Published: (2024)
by: Li, Yixing, et al.
Published: (2024)
Improving Child Speech Recognition and Reading Mistake Detection by Using Prompts
by: Gao, Lingyun, et al.
Published: (2025)
by: Gao, Lingyun, et al.
Published: (2025)
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
by: Xu, Chun, et al.
Published: (2024)
by: Xu, Chun, et al.
Published: (2024)
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
by: Zhu, Yongxin, et al.
Published: (2024)
by: Zhu, Yongxin, et al.
Published: (2024)
AzeroS: Extending LLM to Speech with Self-Generated Instruction-Free Tuning
by: Shao, Yiwen, et al.
Published: (2025)
by: Shao, Yiwen, et al.
Published: (2025)
UniCoM: A Universal Code-Switching Speech Generator
by: Lee, Sangmin, et al.
Published: (2025)
by: Lee, Sangmin, et al.
Published: (2025)
SpeechTaxi: On Multilingual Semantic Speech Classification
by: Keller, Lennart, et al.
Published: (2024)
by: Keller, Lennart, et al.
Published: (2024)
High-Fidelity Simultaneous Speech-To-Speech Translation
by: Labiausse, Tom, et al.
Published: (2025)
by: Labiausse, Tom, et al.
Published: (2025)
Continual Speech Learning with Fused Speech Features
by: Wang, Guitao, et al.
Published: (2025)
by: Wang, Guitao, et al.
Published: (2025)
Similar Items
-
ChildGuard: A Specialized Dataset for Combatting Child-Targeted Hate Speech
by: Kashyap, Gautam Siddharth, et al.
Published: (2025) -
A model of early word acquisition based on realistic-scale audiovisual naming events
by: Khorrami, Khazar, et al.
Published: (2024) -
Direct Speech to Speech Translation: A Review
by: Sarim, Mohammad, et al.
Published: (2025) -
Pisets: A Robust Speech Recognition System for Lectures and Interviews
by: Bondarenko, Ivan, et al.
Published: (2026) -
Direct Speech-to-Speech Neural Machine Translation: A Survey
by: Gupta, Mahendra, et al.
Published: (2024)