:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Räsänen, Okko, Kocharov, Daniil
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2405.07700
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ChildGuard: A Specialized Dataset for Combatting Child-Targeted Hate Speech
by: Kashyap, Gautam Siddharth, et al.
Published: (2025)

A model of early word acquisition based on realistic-scale audiovisual naming events
by: Khorrami, Khazar, et al.
Published: (2024)

Direct Speech to Speech Translation: A Review
by: Sarim, Mohammad, et al.
Published: (2025)

Pisets: A Robust Speech Recognition System for Lectures and Interviews
by: Bondarenko, Ivan, et al.
Published: (2026)

Direct Speech-to-Speech Neural Machine Translation: A Survey
by: Gupta, Mahendra, et al.
Published: (2024)

Automatic Speech Recognition of Non-Native Child Speech for Language Learning Applications
by: Wills, Simone, et al.
Published: (2023)

SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
by: Hu, Ke, et al.
Published: (2025)

SpeechAlign: Aligning Speech Generation to Human Preferences
by: Zhang, Dong, et al.
Published: (2024)

SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation
by: Zhang, Dong, et al.
Published: (2024)

Automatic Speech Recognition for African Low-Resource Languages: Challenges and Future Directions
by: Imam, Sukairaj Hafiz, et al.
Published: (2025)

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024)

Generative Expressive Conversational Speech Synthesis
by: Liu, Rui, et al.
Published: (2024)

Unified Pathological Speech Analysis with Prompt Tuning
by: Yang, Fei, et al.
Published: (2024)

Dialectal Coverage And Generalization in Arabic Speech Recognition
by: Djanibekov, Amirbek, et al.
Published: (2024)

Cross-Utterance Conditioned VAE for Speech Generation
by: Li, Yang, et al.
Published: (2023)

SEAL: Speech Embedding Alignment Learning for Speech Large Language Model with Retrieval-Augmented Generation
by: Sun, Chunyu, et al.
Published: (2025)

Scaling Analysis of Interleaved Speech-Text Language Models
by: Maimon, Gallil, et al.
Published: (2025)

Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
by: Lou, Haowei, et al.
Published: (2025)

Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving
by: Xie, Jingran, et al.
Published: (2025)

Long-Form Speech Generation with Spoken Language Models
by: Park, Se Jin, et al.
Published: (2024)

SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
by: Fucci, Dennis, et al.
Published: (2024)

Rethinking Discrete Speech Representation Tokens for Accent Generation
by: Zhong, Jinzuomu, et al.
Published: (2026)

Examining Test-Time Adaptation for Personalized Child Speech Recognition
by: Shi, Zhonghao, et al.
Published: (2024)

Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
by: Kuan, Chun-Yi, et al.
Published: (2024)

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders
by: Shi, Hao, et al.
Published: (2023)

Word Level Timestamp Generation for Automatic Speech Recognition and Translation
by: Hu, Ke, et al.
Published: (2025)

SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
by: Zhang, Xin, et al.
Published: (2023)

Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
by: Futami, Hayato, et al.
Published: (2025)

Improving Speech-based Emotion Recognition with Contextual Utterance Analysis and LLMs
by: Zhang, Enshi, et al.
Published: (2024)

Analysis of Speech Temporal Dynamics in the Context of Speaker Verification and Voice Anonymization
by: Tomashenko, Natalia, et al.
Published: (2024)

Graph Modelling Analysis of Speech-Gesture Interaction for Aphasia Severity Estimation
by: Kollapally, Navya Martin, et al.
Published: (2026)

Continuous Speech Tokenizer in Text To Speech
by: Li, Yixing, et al.
Published: (2024)

Improving Child Speech Recognition and Reading Mistake Detection by Using Prompts
by: Gao, Lingyun, et al.
Published: (2025)

Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
by: Xu, Chun, et al.
Published: (2024)

Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
by: Zhu, Yongxin, et al.
Published: (2024)

AzeroS: Extending LLM to Speech with Self-Generated Instruction-Free Tuning
by: Shao, Yiwen, et al.
Published: (2025)

UniCoM: A Universal Code-Switching Speech Generator
by: Lee, Sangmin, et al.
Published: (2025)

SpeechTaxi: On Multilingual Semantic Speech Classification
by: Keller, Lennart, et al.
Published: (2024)

High-Fidelity Simultaneous Speech-To-Speech Translation
by: Labiausse, Tom, et al.
Published: (2025)

Continual Speech Learning with Fused Speech Features
by: Wang, Guitao, et al.
Published: (2025)