Saved in:
| Main Authors: | Shen, Gaofei, Bentum, Martijn, Lentz, Tom, Alishahi, Afra, Chrupała, Grzegorz |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.00607 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Encoding of lexical tone in self-supervised models of spoken language
by: Shen, Gaofei, et al.
Published: (2024)
by: Shen, Gaofei, et al.
Published: (2024)
On the reliability of feature attribution methods for speech classification
by: Shen, Gaofei, et al.
Published: (2025)
by: Shen, Gaofei, et al.
Published: (2025)
Disentangling Textual and Acoustic Features of Neural Speech Representations
by: Mohebbi, Hosein, et al.
Published: (2024)
by: Mohebbi, Hosein, et al.
Published: (2024)
Tracking the emergence of linguistic structure in self-supervised models learning from speech
by: Kloots, Marianne de Heer, et al.
Published: (2026)
by: Kloots, Marianne de Heer, et al.
Published: (2026)
What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training
by: Kloots, Marianne de Heer, et al.
Published: (2025)
by: Kloots, Marianne de Heer, et al.
Published: (2025)
Segmental Attention Decoding With Long Form Acoustic Encodings
by: Swietojanski, Pawel, et al.
Published: (2025)
by: Swietojanski, Pawel, et al.
Published: (2025)
Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding
by: Zhao, Jiahui, et al.
Published: (2024)
by: Zhao, Jiahui, et al.
Published: (2024)
Decoding Linguistic Representations of Human Brain
by: Wang, Yu, et al.
Published: (2024)
by: Wang, Yu, et al.
Published: (2024)
Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
by: Wu, Haibin, et al.
Published: (2025)
by: Wu, Haibin, et al.
Published: (2025)
Reducing Object Hallucination in Large Audio-Language Models via Audio-Aware Decoding
by: Hsu, Tzu-wen, et al.
Published: (2025)
by: Hsu, Tzu-wen, et al.
Published: (2025)
Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations
by: He, Linyang, et al.
Published: (2025)
by: He, Linyang, et al.
Published: (2025)
How Contrastive Decoding Enhances Large Audio Language Models?
by: Lin, Tzu-Quan, et al.
Published: (2026)
by: Lin, Tzu-Quan, et al.
Published: (2026)
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
by: Liu, Jizhong, et al.
Published: (2024)
by: Liu, Jizhong, et al.
Published: (2024)
Probing Audio-Generation Capabilities of Text-Based Language Models
by: Anbazhagan, Arjun Prasaath, et al.
Published: (2025)
by: Anbazhagan, Arjun Prasaath, et al.
Published: (2025)
Investigating Decoder-only Large Language Models for Speech-to-text Translation
by: Huang, Chao-Wei, et al.
Published: (2024)
by: Huang, Chao-Wei, et al.
Published: (2024)
Training and Inference Efficiency of Encoder-Decoder Speech Models
by: Żelasko, Piotr, et al.
Published: (2025)
by: Żelasko, Piotr, et al.
Published: (2025)
Multi-Level Embedding Conformer Framework for Bengali Automatic Speech Recognition
by: Sakib, Md. Nazmus, et al.
Published: (2025)
by: Sakib, Md. Nazmus, et al.
Published: (2025)
Full-text Error Correction for Chinese Speech Recognition with Large Language Model
by: Tang, Zhiyuan, et al.
Published: (2024)
by: Tang, Zhiyuan, et al.
Published: (2024)
LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation
by: Menta, Venkata Pushpak Teja
Published: (2026)
by: Menta, Venkata Pushpak Teja
Published: (2026)
Chain of Correction for Full-text Speech Recognition with Large Language Models
by: Tang, Zhiyuan, et al.
Published: (2025)
by: Tang, Zhiyuan, et al.
Published: (2025)
Probing for Phonology in Self-Supervised Speech Representations: A Case Study on Accent Perception
by: Venkateswaran, Nitin, et al.
Published: (2025)
by: Venkateswaran, Nitin, et al.
Published: (2025)
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
by: Shi, Jiatong, et al.
Published: (2024)
by: Shi, Jiatong, et al.
Published: (2024)
Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers
by: Hentschel, Michael, et al.
Published: (2024)
by: Hentschel, Michael, et al.
Published: (2024)
Romanization Encoding For Multilingual ASR
by: Ding, Wen, et al.
Published: (2024)
by: Ding, Wen, et al.
Published: (2024)
The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties
by: Chen, William, et al.
Published: (2025)
by: Chen, William, et al.
Published: (2025)
Scaling Open Discrete Audio Foundation Models with Interleaved Semantic, Acoustic, and Text Tokens
by: Manakul, Potsawee, et al.
Published: (2026)
by: Manakul, Potsawee, et al.
Published: (2026)
TranSentence: Speech-to-speech Translation via Language-agnostic Sentence-level Speech Encoding without Language-parallel Data
by: Kim, Seung-Bin, et al.
Published: (2024)
by: Kim, Seung-Bin, et al.
Published: (2024)
What Do Language Models Hear? Probing for Auditory Representations in Language Models
by: Ngo, Jerry, et al.
Published: (2024)
by: Ngo, Jerry, et al.
Published: (2024)
Implicit Self-supervised Language Representation for Spoken Language Diarization
by: Mishra, Jagabandhu, et al.
Published: (2023)
by: Mishra, Jagabandhu, et al.
Published: (2023)
Decoder-only Architecture for Streaming End-to-end Speech Recognition
by: Tsunoo, Emiru, et al.
Published: (2024)
by: Tsunoo, Emiru, et al.
Published: (2024)
Speech Codec Probing from Semantic and Phonetic Perspectives
by: Shi, Xuan, et al.
Published: (2026)
by: Shi, Xuan, et al.
Published: (2026)
Benchmarking Prosody Encoding in Discrete Speech Tokens
by: Onda, Kentaro, et al.
Published: (2025)
by: Onda, Kentaro, et al.
Published: (2025)
Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models
by: Tang, Zhiyuan, et al.
Published: (2024)
by: Tang, Zhiyuan, et al.
Published: (2024)
CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition
by: Bartelds, Martijn, et al.
Published: (2025)
by: Bartelds, Martijn, et al.
Published: (2025)
Emphasis Sensitivity in Speech Representations
by: Cassini, Shaun, et al.
Published: (2025)
by: Cassini, Shaun, et al.
Published: (2025)
Zero-shot Context Biasing with Trie-based Decoding using Synthetic Multi-Pronunciation
by: Liu, Changsong, et al.
Published: (2025)
by: Liu, Changsong, et al.
Published: (2025)
Enhancing Multilingual ASR for Unseen Languages via Language Embedding Modeling
by: Huang, Shao-Syuan, et al.
Published: (2024)
by: Huang, Shao-Syuan, et al.
Published: (2024)
Model-free Speculative Decoding for Transformer-based ASR with Token Map Drafting
by: Ho, Tuan Vu, et al.
Published: (2025)
by: Ho, Tuan Vu, et al.
Published: (2025)
Initial Decoding with Minimally Augmented Language Model for Improved Lattice Rescoring in Low Resource ASR
by: Murthy, Savitha, et al.
Published: (2024)
by: Murthy, Savitha, et al.
Published: (2024)
Towards Hierarchical Spoken Language Dysfluency Modeling
by: Lian, Jiachen, et al.
Published: (2024)
by: Lian, Jiachen, et al.
Published: (2024)
Similar Items
-
Encoding of lexical tone in self-supervised models of spoken language
by: Shen, Gaofei, et al.
Published: (2024) -
On the reliability of feature attribution methods for speech classification
by: Shen, Gaofei, et al.
Published: (2025) -
Disentangling Textual and Acoustic Features of Neural Speech Representations
by: Mohebbi, Hosein, et al.
Published: (2024) -
Tracking the emergence of linguistic structure in self-supervised models learning from speech
by: Kloots, Marianne de Heer, et al.
Published: (2026) -
What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training
by: Kloots, Marianne de Heer, et al.
Published: (2025)