Saved in:
| Main Authors: | Zhang, Pei, Chen, Andong, Chen, Xi, Yang, Baosong, Wong, Derek F., Huang, Fei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.19745 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Direct Simultaneous Translation Activation for Large Audio-Language Models
by: Zhang, Pei, et al.
Published: (2025)
by: Zhang, Pei, et al.
Published: (2025)
Adaptive Inner Speech-Text Alignment for LLM-based Speech Translation
by: Liu, Henglyu, et al.
Published: (2025)
by: Liu, Henglyu, et al.
Published: (2025)
Towards Fine-Grained Code-Switch Speech Translation with Semantic Space Alignment
by: Gao, Yan, et al.
Published: (2025)
by: Gao, Yan, et al.
Published: (2025)
Soundwave: Less is More for Speech-Text Alignment in LLMs
by: Zhang, Yuhao, et al.
Published: (2025)
by: Zhang, Yuhao, et al.
Published: (2025)
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
by: Futami, Hayato, et al.
Published: (2025)
by: Futami, Hayato, et al.
Published: (2025)
POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
by: Li, Xuanchen, et al.
Published: (2025)
by: Li, Xuanchen, et al.
Published: (2025)
SSHR: Leveraging Self-supervised Hierarchical Representations for Multilingual Automatic Speech Recognition
by: Xue, Hongfei, et al.
Published: (2023)
by: Xue, Hongfei, et al.
Published: (2023)
Speech-Worthy Alignment for Japanese SpeechLLMs via Direct Preference Optimization
by: Zhao, Mengjie, et al.
Published: (2026)
by: Zhao, Mengjie, et al.
Published: (2026)
Joint Training And Decoding for Multilingual End-to-End Simultaneous Speech Translation
by: Huang, Wuwei, et al.
Published: (2025)
by: Huang, Wuwei, et al.
Published: (2025)
mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks
by: Beyene, Luel Hagos, et al.
Published: (2025)
by: Beyene, Luel Hagos, et al.
Published: (2025)
Configurable Multilingual ASR with Speech Summary Representations
by: Zhu, Harrison, et al.
Published: (2024)
by: Zhu, Harrison, et al.
Published: (2024)
Multilingual Source Tracing of Speech Deepfakes: A First Benchmark
by: Xuan, Xi, et al.
Published: (2025)
by: Xuan, Xi, et al.
Published: (2025)
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
by: Shi, Jiatong, et al.
Published: (2023)
by: Shi, Jiatong, et al.
Published: (2023)
ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs
by: Mousavi, Pooneh, et al.
Published: (2025)
by: Mousavi, Pooneh, et al.
Published: (2025)
Revisiting Direct Speech-to-Text Translation with Speech LLMs: Better Scaling than CoT Prompting?
by: Pareras, Oriol, et al.
Published: (2025)
by: Pareras, Oriol, et al.
Published: (2025)
Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs
by: Wang, Dingdong, et al.
Published: (2025)
by: Wang, Dingdong, et al.
Published: (2025)
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
by: Lou, Haowei, et al.
Published: (2025)
by: Lou, Haowei, et al.
Published: (2025)
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
by: Casanova, Edresson, et al.
Published: (2024)
by: Casanova, Edresson, et al.
Published: (2024)
Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition
by: Yang, Zhengdong, et al.
Published: (2025)
by: Yang, Zhengdong, et al.
Published: (2025)
Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training
by: Denisov, Pavel, et al.
Published: (2024)
by: Denisov, Pavel, et al.
Published: (2024)
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
by: Park, Chanho, et al.
Published: (2023)
by: Park, Chanho, et al.
Published: (2023)
Seamless Dysfluent Speech Text Alignment for Disordered Speech Analysis
by: Ye, Zongli, et al.
Published: (2025)
by: Ye, Zongli, et al.
Published: (2025)
Zipper-LoRA: Dynamic Parameter Decoupling for Speech-LLM based Multilingual Speech Recognition
by: Mei, Yuxiang, et al.
Published: (2026)
by: Mei, Yuxiang, et al.
Published: (2026)
TESU-LLM: Training Speech-LLMs Without Speech via Unified Encoder Alignment
by: Kim, Taesoo, et al.
Published: (2025)
by: Kim, Taesoo, et al.
Published: (2025)
Africa-Centric Self-Supervised Pre-Training for Multilingual Speech Representation in a Sub-Saharan Context
by: Caubrière, Antoine, et al.
Published: (2024)
by: Caubrière, Antoine, et al.
Published: (2024)
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
by: Deng, Keqi, et al.
Published: (2025)
by: Deng, Keqi, et al.
Published: (2025)
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
by: Zhang, Yuhao, et al.
Published: (2025)
by: Zhang, Yuhao, et al.
Published: (2025)
Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models
by: Wang, Haoyu, et al.
Published: (2022)
by: Wang, Haoyu, et al.
Published: (2022)
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
by: He, Haorui, et al.
Published: (2025)
by: He, Haorui, et al.
Published: (2025)
Romanization Encoding For Multilingual ASR
by: Ding, Wen, et al.
Published: (2024)
by: Ding, Wen, et al.
Published: (2024)
Length-Aware Rotary Position Embedding for Text-Speech Alignment
by: Kim, Hyeongju, et al.
Published: (2025)
by: Kim, Hyeongju, et al.
Published: (2025)
SpeechTaxi: On Multilingual Semantic Speech Classification
by: Keller, Lennart, et al.
Published: (2024)
by: Keller, Lennart, et al.
Published: (2024)
Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks
by: Du, Yichao, et al.
Published: (2024)
by: Du, Yichao, et al.
Published: (2024)
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems
by: Huang, Kexin, et al.
Published: (2025)
by: Huang, Kexin, et al.
Published: (2025)
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
by: Zhang, Wenyu, et al.
Published: (2025)
by: Zhang, Wenyu, et al.
Published: (2025)
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs
by: Yuhang, Yang, et al.
Published: (2024)
by: Yuhang, Yang, et al.
Published: (2024)
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
by: Song, Yuhan, et al.
Published: (2025)
by: Song, Yuhan, et al.
Published: (2025)
Representation Purification for End-to-End Speech Translation
by: Zhang, Chengwei, et al.
Published: (2024)
by: Zhang, Chengwei, et al.
Published: (2024)
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
by: Han, Bing, et al.
Published: (2024)
by: Han, Bing, et al.
Published: (2024)
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
by: Chen, Chen, et al.
Published: (2024)
by: Chen, Chen, et al.
Published: (2024)
Similar Items
-
Direct Simultaneous Translation Activation for Large Audio-Language Models
by: Zhang, Pei, et al.
Published: (2025) -
Adaptive Inner Speech-Text Alignment for LLM-based Speech Translation
by: Liu, Henglyu, et al.
Published: (2025) -
Towards Fine-Grained Code-Switch Speech Translation with Semantic Space Alignment
by: Gao, Yan, et al.
Published: (2025) -
Soundwave: Less is More for Speech-Text Alignment in LLMs
by: Zhang, Yuhao, et al.
Published: (2025) -
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
by: Futami, Hayato, et al.
Published: (2025)