Guardado en:
| Autores principales: | Pareras, Oriol, Gállego, Gerard I., Costa, Federico, España-Bonet, Cristina, Hernando, Javier |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2510.03093 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Listening or Reading? Evaluating Speech Awareness in Chain-of-Thought Speech-to-Text Translation
por: Romero-Díaz, Jacobo, et al.
Publicado: (2025)
por: Romero-Díaz, Jacobo, et al.
Publicado: (2025)
Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios
por: Gállego, Gerard I., et al.
Publicado: (2025)
por: Gállego, Gerard I., et al.
Publicado: (2025)
Quantifying Cross-Lingual Transfer in Paralinguistic Speech Tasks
por: Buitrago, Pol, et al.
Publicado: (2026)
por: Buitrago, Pol, et al.
Publicado: (2026)
Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
por: Papi, Sara, et al.
Publicado: (2025)
por: Papi, Sara, et al.
Publicado: (2025)
Double Multi-Head Attention Multimodal System for Odyssey 2024 Speech Emotion Recognition Challenge
por: Costa, Federico, et al.
Publicado: (2024)
por: Costa, Federico, et al.
Publicado: (2024)
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
por: Futami, Hayato, et al.
Publicado: (2025)
por: Futami, Hayato, et al.
Publicado: (2025)
Direct Speech to Speech Translation: A Review
por: Sarim, Mohammad, et al.
Publicado: (2025)
por: Sarim, Mohammad, et al.
Publicado: (2025)
Bootstrapping Audiovisual Speech Recognition in Zero-AV-Resource Scenarios with Synthetic Visual Data
por: Buitrago, Pol, et al.
Publicado: (2026)
por: Buitrago, Pol, et al.
Publicado: (2026)
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
por: Shankar, Bhavani, et al.
Publicado: (2024)
por: Shankar, Bhavani, et al.
Publicado: (2024)
Enhancing Non-Core Language Instruction-Following in Speech LLMs via Semi-Implicit Cross-Lingual CoT Reasoning
por: Xue, Hongfei, et al.
Publicado: (2025)
por: Xue, Hongfei, et al.
Publicado: (2025)
POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
por: Li, Xuanchen, et al.
Publicado: (2025)
por: Li, Xuanchen, et al.
Publicado: (2025)
Direct Speech-to-Speech Neural Machine Translation: A Survey
por: Gupta, Mahendra, et al.
Publicado: (2024)
por: Gupta, Mahendra, et al.
Publicado: (2024)
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
por: Deng, Keqi, et al.
Publicado: (2025)
por: Deng, Keqi, et al.
Publicado: (2025)
Speech-Worthy Alignment for Japanese SpeechLLMs via Direct Preference Optimization
por: Zhao, Mengjie, et al.
Publicado: (2026)
por: Zhao, Mengjie, et al.
Publicado: (2026)
Adaptive Inner Speech-Text Alignment for LLM-based Speech Translation
por: Liu, Henglyu, et al.
Publicado: (2025)
por: Liu, Henglyu, et al.
Publicado: (2025)
Scaling Rich Style-Prompted Text-to-Speech Datasets
por: Diwan, Anuj, et al.
Publicado: (2025)
por: Diwan, Anuj, et al.
Publicado: (2025)
SpeechT: Findings of the First Mentorship in Speech Translation
por: Moslem, Yasmin, et al.
Publicado: (2025)
por: Moslem, Yasmin, et al.
Publicado: (2025)
Speaker Characterization by means of Attention Pooling
por: Costa, Federico, et al.
Publicado: (2024)
por: Costa, Federico, et al.
Publicado: (2024)
Preserving Speaker Information in Direct Speech-to-Speech Translation with Non-Autoregressive Generation and Pretraining
por: Zhou, Rui, et al.
Publicado: (2024)
por: Zhou, Rui, et al.
Publicado: (2024)
Enhancing Crowdsourced Audio for Text-to-Speech Models
por: Giraldo, José, et al.
Publicado: (2024)
por: Giraldo, José, et al.
Publicado: (2024)
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?
por: Tsiamas, Ioannis, et al.
Publicado: (2024)
por: Tsiamas, Ioannis, et al.
Publicado: (2024)
LongSpeech: A Scalable Benchmark for Transcription, Translation and Understanding in Long Speech
por: Yang, Fei, et al.
Publicado: (2026)
por: Yang, Fei, et al.
Publicado: (2026)
End-to-End Speech-to-Text Translation: A Survey
por: Sethiya, Nivedita, et al.
Publicado: (2023)
por: Sethiya, Nivedita, et al.
Publicado: (2023)
On the Use of Audio to Improve Dialogue Policies
por: Roncel, Daniel, et al.
Publicado: (2024)
por: Roncel, Daniel, et al.
Publicado: (2024)
BENYO-S2ST-Corpus-1: A Bilingual English-to-Yoruba Direct Speech-to-Speech Translation Corpus
por: Adetiba, Emmanuel, et al.
Publicado: (2025)
por: Adetiba, Emmanuel, et al.
Publicado: (2025)
DisCo-Speech: Controllable Zero-Shot Speech Generation with A Disentangled Speech Codec
por: Li, Tao, et al.
Publicado: (2025)
por: Li, Tao, et al.
Publicado: (2025)
DMP-TTS: Disentangled multi-modal Prompting for Controllable Text-to-Speech with Chained Guidance
por: Yin, Kang, et al.
Publicado: (2025)
por: Yin, Kang, et al.
Publicado: (2025)
Controlling Emotion in Text-to-Speech with Natural Language Prompts
por: Bott, Thomas, et al.
Publicado: (2024)
por: Bott, Thomas, et al.
Publicado: (2024)
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
por: Lei, Shun, et al.
Publicado: (2023)
por: Lei, Shun, et al.
Publicado: (2023)
Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech Translation
por: Goncalves, Lucas, et al.
Publicado: (2024)
por: Goncalves, Lucas, et al.
Publicado: (2024)
High-Fidelity Simultaneous Speech-To-Speech Translation
por: Labiausse, Tom, et al.
Publicado: (2025)
por: Labiausse, Tom, et al.
Publicado: (2025)
Refining Pseudo-Audio Prompts with Speech-Text Alignment for Text-Only Domain Adaptation in LLM-Based ASR
por: Magoshi, Ryo, et al.
Publicado: (2026)
por: Magoshi, Ryo, et al.
Publicado: (2026)
UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice
por: Cheng, Sitong, et al.
Publicado: (2025)
por: Cheng, Sitong, et al.
Publicado: (2025)
Soundwave: Less is More for Speech-Text Alignment in LLMs
por: Zhang, Yuhao, et al.
Publicado: (2025)
por: Zhang, Yuhao, et al.
Publicado: (2025)
StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection
por: Papi, Sara, et al.
Publicado: (2024)
por: Papi, Sara, et al.
Publicado: (2024)
Unveiling the Role of Pretraining in Direct Speech Translation
por: Alastruey, Belen, et al.
Publicado: (2024)
por: Alastruey, Belen, et al.
Publicado: (2024)
PART: Progressive Alignment Representation Training for Multilingual Speech-To-Text with LLMs
por: Zhang, Pei, et al.
Publicado: (2025)
por: Zhang, Pei, et al.
Publicado: (2025)
FleSpeech: Flexibly Controllable Speech Generation with Various Prompts
por: Li, Hanzhao, et al.
Publicado: (2025)
por: Li, Hanzhao, et al.
Publicado: (2025)
Continuous Speech Tokenizer in Text To Speech
por: Li, Yixing, et al.
Publicado: (2024)
por: Li, Yixing, et al.
Publicado: (2024)
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
por: Hu, Jiliang, et al.
Publicado: (2025)
por: Hu, Jiliang, et al.
Publicado: (2025)
Ejemplares similares
-
Listening or Reading? Evaluating Speech Awareness in Chain-of-Thought Speech-to-Text Translation
por: Romero-Díaz, Jacobo, et al.
Publicado: (2025) -
Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios
por: Gállego, Gerard I., et al.
Publicado: (2025) -
Quantifying Cross-Lingual Transfer in Paralinguistic Speech Tasks
por: Buitrago, Pol, et al.
Publicado: (2026) -
Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
por: Papi, Sara, et al.
Publicado: (2025) -
Double Multi-Head Attention Multimodal System for Odyssey 2024 Speech Emotion Recognition Challenge
por: Costa, Federico, et al.
Publicado: (2024)