:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Zhao, Mengjie, Liu, Lianbo, Fujita, Yusuke, Shi, Hao, Gao, Yuan, Koshkin, Roman, Sudo, Yui
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Sound Computation and Language
Accesso online:	https://arxiv.org/abs/2603.12565
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Distilling LLM Semantic Priors into Encoder-Only Multi-Talker ASR with Talker-Count Routing
di: Shi, Hao, et al.
Pubblicazione: (2026)

Streaming Translation and Transcription Through Speech-to-Text Causal Alignment
di: Koshkin, Roman, et al.
Pubblicazione: (2026)

OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
di: Sudo, Yui, et al.
Pubblicazione: (2025)

Serialized Output Prompting for Large Language Model-based Multi-Talker Speech Recognition
di: Shi, Hao, et al.
Pubblicazione: (2025)

AC/DC: LLM-based Audio Comprehension via Dialogue Continuation
di: Fujita, Yusuke, et al.
Pubblicazione: (2025)

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
di: Song, Yuhan, et al.
Pubblicazione: (2025)

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
di: Peng, Yifan, et al.
Pubblicazione: (2024)

Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation
di: Shakeel, Muhammad, et al.
Pubblicazione: (2024)

Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs
di: Wang, Dingdong, et al.
Pubblicazione: (2025)

The Voice Behind the Words: Quantifying Intersectional Bias in SpeechLLMs
di: Satish, Shree Harsha Bokkahalli, et al.
Pubblicazione: (2026)

DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization
di: Yang, Jianing, et al.
Pubblicazione: (2026)

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
di: Shakeel, Muhammad, et al.
Pubblicazione: (2024)

Contextualized Automatic Speech Recognition with Dynamic Vocabulary
di: Sudo, Yui, et al.
Pubblicazione: (2024)

Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search
di: Sudo, Yui, et al.
Pubblicazione: (2024)

Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs
di: Quang, Trung Nguyen, et al.
Pubblicazione: (2026)

DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs
di: Papi, Sara, et al.
Pubblicazione: (2026)

DYNAC: Dynamic Vocabulary based Non-Autoregressive Contextualization for Speech Recognition
di: Sudo, Yui, et al.
Pubblicazione: (2025)

Rubric-Guided Fine-tuning of SpeechLLMs for Multi-Aspect, Multi-Rater L2 Reading-Speech Assessment
di: Parikh, Aditya Kamlesh, et al.
Pubblicazione: (2026)

OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning
di: Peng, Yifan, et al.
Pubblicazione: (2025)

Emotion-Aligned Generation in Diffusion Text to Speech Models via Preference-Guided Optimization
di: Shi, Jiacheng, et al.
Pubblicazione: (2025)

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
di: Fujita, Kenichi, et al.
Pubblicazione: (2024)

Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs
di: Satish, Shree Harsha Bokkahalli, et al.
Pubblicazione: (2025)

Qwen vs. Gemma Integration with Whisper: A Comparative Study in Multilingual SpeechLLM Systems
di: Nguyen, Tuan, et al.
Pubblicazione: (2025)

SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation
di: Wang, Hui, et al.
Pubblicazione: (2025)

Revisiting Direct Speech-to-Text Translation with Speech LLMs: Better Scaling than CoT Prompting?
di: Pareras, Oriol, et al.
Pubblicazione: (2025)

Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs
di: Ma, Rao, et al.
Pubblicazione: (2025)

Lightweight Zero-shot Text-to-Speech with Mixture of Adapters
di: Fujita, Kenichi, et al.
Pubblicazione: (2024)

PART: Progressive Alignment Representation Training for Multilingual Speech-To-Text with LLMs
di: Zhang, Pei, et al.
Pubblicazione: (2025)

Soundwave: Less is More for Speech-Text Alignment in LLMs
di: Zhang, Yuhao, et al.
Pubblicazione: (2025)

Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers
di: Hentschel, Michael, et al.
Pubblicazione: (2024)

S2ST-Omni: Hierarchical Language-Aware SpeechLLM Adaptation for Multilingual Speech-to-Speech Translation
di: Pan, Yu, et al.
Pubblicazione: (2025)

When Voice Matters: Evidence of Gender Disparity in Positional Bias of SpeechLLMs
di: Satish, Shree Harsha Bokkahalli, et al.
Pubblicazione: (2025)

Adaptive Inner Speech-Text Alignment for LLM-based Speech Translation
di: Liu, Henglyu, et al.
Pubblicazione: (2025)

POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
di: Li, Xuanchen, et al.
Pubblicazione: (2025)

SpeechAlign: Aligning Speech Generation to Human Preferences
di: Zhang, Dong, et al.
Pubblicazione: (2024)

TESU-LLM: Training Speech-LLMs Without Speech via Unified Encoder Alignment
di: Kim, Taesoo, et al.
Pubblicazione: (2025)

Direct Speech to Speech Translation: A Review
di: Sarim, Mohammad, et al.
Pubblicazione: (2025)

AlignCap: Aligning Speech Emotion Captioning to Human Preferences
di: Liang, Ziqi, et al.
Pubblicazione: (2024)

Adapting Foundation Speech Recognition Models to Impaired Speech: A Semantic Re-chaining Approach for Personalization of German Speech
di: Pokel, Niclas, et al.
Pubblicazione: (2025)

Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
di: Futami, Hayato, et al.
Pubblicazione: (2025)