:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Pareras, Oriol, Gállego, Gerard I., Costa, Federico, España-Bonet, Cristina, Hernando, Javier
Formato:	Preprint
Publicado:	2025
Materias:	Computation and Language Sound
Acceso en línea:	https://arxiv.org/abs/2510.03093
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Listening or Reading? Evaluating Speech Awareness in Chain-of-Thought Speech-to-Text Translation
por: Romero-Díaz, Jacobo, et al.
Publicado: (2025)

Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios
por: Gállego, Gerard I., et al.
Publicado: (2025)

Quantifying Cross-Lingual Transfer in Paralinguistic Speech Tasks
por: Buitrago, Pol, et al.
Publicado: (2026)

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
por: Papi, Sara, et al.
Publicado: (2025)

Double Multi-Head Attention Multimodal System for Odyssey 2024 Speech Emotion Recognition Challenge
por: Costa, Federico, et al.
Publicado: (2024)

Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
por: Futami, Hayato, et al.
Publicado: (2025)

Direct Speech to Speech Translation: A Review
por: Sarim, Mohammad, et al.
Publicado: (2025)

Bootstrapping Audiovisual Speech Recognition in Zero-AV-Resource Scenarios with Synthetic Visual Data
por: Buitrago, Pol, et al.
Publicado: (2026)

CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
por: Shankar, Bhavani, et al.
Publicado: (2024)

Enhancing Non-Core Language Instruction-Following in Speech LLMs via Semi-Implicit Cross-Lingual CoT Reasoning
por: Xue, Hongfei, et al.
Publicado: (2025)

POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
por: Li, Xuanchen, et al.
Publicado: (2025)

Direct Speech-to-Speech Neural Machine Translation: A Survey
por: Gupta, Mahendra, et al.
Publicado: (2024)

SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
por: Deng, Keqi, et al.
Publicado: (2025)

Speech-Worthy Alignment for Japanese SpeechLLMs via Direct Preference Optimization
por: Zhao, Mengjie, et al.
Publicado: (2026)

Adaptive Inner Speech-Text Alignment for LLM-based Speech Translation
por: Liu, Henglyu, et al.
Publicado: (2025)

Scaling Rich Style-Prompted Text-to-Speech Datasets
por: Diwan, Anuj, et al.
Publicado: (2025)

SpeechT: Findings of the First Mentorship in Speech Translation
por: Moslem, Yasmin, et al.
Publicado: (2025)

Speaker Characterization by means of Attention Pooling
por: Costa, Federico, et al.
Publicado: (2024)

Preserving Speaker Information in Direct Speech-to-Speech Translation with Non-Autoregressive Generation and Pretraining
por: Zhou, Rui, et al.
Publicado: (2024)

Enhancing Crowdsourced Audio for Text-to-Speech Models
por: Giraldo, José, et al.
Publicado: (2024)

Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?
por: Tsiamas, Ioannis, et al.
Publicado: (2024)

LongSpeech: A Scalable Benchmark for Transcription, Translation and Understanding in Long Speech
por: Yang, Fei, et al.
Publicado: (2026)

End-to-End Speech-to-Text Translation: A Survey
por: Sethiya, Nivedita, et al.
Publicado: (2023)

On the Use of Audio to Improve Dialogue Policies
por: Roncel, Daniel, et al.
Publicado: (2024)

BENYO-S2ST-Corpus-1: A Bilingual English-to-Yoruba Direct Speech-to-Speech Translation Corpus
por: Adetiba, Emmanuel, et al.
Publicado: (2025)

DisCo-Speech: Controllable Zero-Shot Speech Generation with A Disentangled Speech Codec
por: Li, Tao, et al.
Publicado: (2025)

DMP-TTS: Disentangled multi-modal Prompting for Controllable Text-to-Speech with Chained Guidance
por: Yin, Kang, et al.
Publicado: (2025)

Controlling Emotion in Text-to-Speech with Natural Language Prompts
por: Bott, Thomas, et al.
Publicado: (2024)

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
por: Lei, Shun, et al.
Publicado: (2023)

Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech Translation
por: Goncalves, Lucas, et al.
Publicado: (2024)

High-Fidelity Simultaneous Speech-To-Speech Translation
por: Labiausse, Tom, et al.
Publicado: (2025)

Refining Pseudo-Audio Prompts with Speech-Text Alignment for Text-Only Domain Adaptation in LLM-Based ASR
por: Magoshi, Ryo, et al.
Publicado: (2026)

UniSS: Unified Expressive Speech-to-Speech Translation with Your Voice
por: Cheng, Sitong, et al.
Publicado: (2025)

Soundwave: Less is More for Speech-Text Alignment in LLMs
por: Zhang, Yuhao, et al.
Publicado: (2025)

StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection
por: Papi, Sara, et al.
Publicado: (2024)

Unveiling the Role of Pretraining in Direct Speech Translation
por: Alastruey, Belen, et al.
Publicado: (2024)

PART: Progressive Alignment Representation Training for Multilingual Speech-To-Text with LLMs
por: Zhang, Pei, et al.
Publicado: (2025)

FleSpeech: Flexibly Controllable Speech Generation with Various Prompts
por: Li, Hanzhao, et al.
Publicado: (2025)

Continuous Speech Tokenizer in Text To Speech
por: Li, Yixing, et al.
Publicado: (2024)

Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
por: Hu, Jiliang, et al.
Publicado: (2025)