Saved in:
| Main Authors: | Sankar, Sanjana, Lenglet, Martin, Bailly, Gerard, Beautemps, Denis, Hueber, Thomas |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.04799 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Scaling Speech-Text Pre-training with Synthetic Interleaved Data
by: Zeng, Aohan, et al.
Published: (2024)
by: Zeng, Aohan, et al.
Published: (2024)
Is Self-Supervised Learning Enough to Fill in the Gap? A Study on Speech Inpainting
by: Asaad, Ihab, et al.
Published: (2024)
by: Asaad, Ihab, et al.
Published: (2024)
Generative Pre-training for Speech with Flow Matching
by: Liu, Alexander H., et al.
Published: (2023)
by: Liu, Alexander H., et al.
Published: (2023)
Speech-FT: Merging Pre-trained And Fine-Tuned Speech Representation Models For Cross-Task Generalization
by: Lin, Tzu-Quan, et al.
Published: (2025)
by: Lin, Tzu-Quan, et al.
Published: (2025)
Mark My Words: A Robust Multilingual Model for Punctuation in Text and Speech Transcripts
by: Pulipaka, Sidharth, et al.
Published: (2025)
by: Pulipaka, Sidharth, et al.
Published: (2025)
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
by: Zhu, Yongxin, et al.
Published: (2024)
by: Zhu, Yongxin, et al.
Published: (2024)
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs
by: Yuhang, Yang, et al.
Published: (2024)
by: Yuhang, Yang, et al.
Published: (2024)
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
by: Wu, Yihan, et al.
Published: (2024)
by: Wu, Yihan, et al.
Published: (2024)
Listening or Reading? Evaluating Speech Awareness in Chain-of-Thought Speech-to-Text Translation
by: Romero-Díaz, Jacobo, et al.
Published: (2025)
by: Romero-Díaz, Jacobo, et al.
Published: (2025)
Revisiting Direct Speech-to-Text Translation with Speech LLMs: Better Scaling than CoT Prompting?
by: Pareras, Oriol, et al.
Published: (2025)
by: Pareras, Oriol, et al.
Published: (2025)
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
by: Liu, Alexander H., et al.
Published: (2025)
by: Liu, Alexander H., et al.
Published: (2025)
A Simple Method to Enhance Pre-trained Language Models with Speech Tokens for Classification
by: Calbucura, Nicolas, et al.
Published: (2025)
by: Calbucura, Nicolas, et al.
Published: (2025)
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?
by: Tsiamas, Ioannis, et al.
Published: (2024)
by: Tsiamas, Ioannis, et al.
Published: (2024)
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
by: Ma, Ziyang, et al.
Published: (2023)
by: Ma, Ziyang, et al.
Published: (2023)
MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance
by: Zhao, Xingjian, et al.
Published: (2025)
by: Zhao, Xingjian, et al.
Published: (2025)
RASMALAI: Resources for Adaptive Speech Modeling in Indian Languages with Accents and Intonations
by: Sankar, Ashwin, et al.
Published: (2025)
by: Sankar, Ashwin, et al.
Published: (2025)
SpeechAlign: a Framework for Speech Translation Alignment Evaluation
by: Alastruey, Belen, et al.
Published: (2023)
by: Alastruey, Belen, et al.
Published: (2023)
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024)
by: Cornell, Samuele, et al.
Published: (2024)
An Empirical Analysis of Discrete Unit Representations in Speech Language Modeling Pre-training
by: Labrak, Yanis, et al.
Published: (2025)
by: Labrak, Yanis, et al.
Published: (2025)
On Leveraging Encoder-only Pre-trained Language Models for Effective Keyphrase Generation
by: Wu, Di, et al.
Published: (2024)
by: Wu, Di, et al.
Published: (2024)
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
by: Wang, Dingdong, et al.
Published: (2025)
by: Wang, Dingdong, et al.
Published: (2025)
STTATTS: Unified Speech-To-Text And Text-To-Speech Model
by: Toyin, Hawau Olamide, et al.
Published: (2024)
by: Toyin, Hawau Olamide, et al.
Published: (2024)
Leveraging Unit Language Guidance to Advance Speech Modeling in Textless Speech-to-Speech Translation
by: Zhang, Yuhao, et al.
Published: (2025)
by: Zhang, Yuhao, et al.
Published: (2025)
Text-to-Code Generation with Modality-relative Pre-training
by: Christopoulou, Fenia, et al.
Published: (2024)
by: Christopoulou, Fenia, et al.
Published: (2024)
Continuous Speech Tokenizer in Text To Speech
by: Li, Yixing, et al.
Published: (2024)
by: Li, Yixing, et al.
Published: (2024)
LLM-to-Speech: A Synthetic Data Pipeline for Training Dialectal Text-to-Speech Models
by: Khamis, Ahmed Khaled, et al.
Published: (2026)
by: Khamis, Ahmed Khaled, et al.
Published: (2026)
Speech Recognition Rescoring with Large Speech-Text Foundation Models
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2024)
by: Shivakumar, Prashanth Gurunath, et al.
Published: (2024)
Attentive Merging of Hidden Embeddings from Pre-trained Speech Model for Anti-spoofing Detection
by: Pan, Zihan, et al.
Published: (2024)
by: Pan, Zihan, et al.
Published: (2024)
Chunk Based Speech Pre-training with High Resolution Finite Scalar Quantization
by: Tang, Yun, et al.
Published: (2025)
by: Tang, Yun, et al.
Published: (2025)
BLSP-KD: Bootstrapping Language-Speech Pre-training via Knowledge Distillation
by: Wang, Chen, et al.
Published: (2024)
by: Wang, Chen, et al.
Published: (2024)
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
by: Yadav, Hemant, et al.
Published: (2024)
by: Yadav, Hemant, et al.
Published: (2024)
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
by: Kuan, Chun-Yi, et al.
Published: (2024)
by: Kuan, Chun-Yi, et al.
Published: (2024)
A multilingual training strategy for low resource Text to Speech
by: Amalas, Asma, et al.
Published: (2024)
by: Amalas, Asma, et al.
Published: (2024)
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
by: Rossenbach, Nick, et al.
Published: (2024)
by: Rossenbach, Nick, et al.
Published: (2024)
LatentSpeech: Latent Diffusion for Text-To-Speech Generation
by: Lou, Haowei, et al.
Published: (2024)
by: Lou, Haowei, et al.
Published: (2024)
Streaming Speech-to-Text Translation with a SpeechLLM
by: Parcollet, Titouan, et al.
Published: (2026)
by: Parcollet, Titouan, et al.
Published: (2026)
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
by: Futami, Hayato, et al.
Published: (2025)
by: Futami, Hayato, et al.
Published: (2025)
Streaming Speech-to-Confusion Network Speech Recognition
by: Filimonov, Denis, et al.
Published: (2023)
by: Filimonov, Denis, et al.
Published: (2023)
Revisiting Interpolation Augmentation for Speech-to-Text Generation
by: Xu, Chen, et al.
Published: (2024)
by: Xu, Chen, et al.
Published: (2024)
Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving
by: Xie, Jingran, et al.
Published: (2025)
by: Xie, Jingran, et al.
Published: (2025)
Similar Items
-
Scaling Speech-Text Pre-training with Synthetic Interleaved Data
by: Zeng, Aohan, et al.
Published: (2024) -
Is Self-Supervised Learning Enough to Fill in the Gap? A Study on Speech Inpainting
by: Asaad, Ihab, et al.
Published: (2024) -
Generative Pre-training for Speech with Flow Matching
by: Liu, Alexander H., et al.
Published: (2023) -
Speech-FT: Merging Pre-trained And Fine-Tuned Speech Representation Models For Cross-Task Generalization
by: Lin, Tzu-Quan, et al.
Published: (2025) -
Mark My Words: A Robust Multilingual Model for Punctuation in Text and Speech Transcripts
by: Pulipaka, Sidharth, et al.
Published: (2025)