:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Liming, Ni, Junrui, Chang, Kai-Wei, Bhati, Saurabhchand, Harwath, David, Hasegawa-Johnson, Mark, Glass, James R.
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.03639
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards Audio Token Compression in Large Audio Language Models
by: Bhati, Saurabhchand, et al.
Published: (2025)

SyllableLM: Learning Coarse Semantic Units for Speech Language Models
by: Baade, Alan, et al.
Published: (2024)

Towards Unsupervised Speech Recognition Without Pronunciation Models
by: Ni, Junrui, et al.
Published: (2024)

Recognizing Dementia from Neuropsychological Tests with State Space Models
by: Wang, Liming, et al.
Published: (2025)

USAD: Universal Speech and Audio Representation via Distillation
by: Chang, Heng-Jui, et al.
Published: (2025)

State-Space Large Audio Language Models
by: Bhati, Saurabhchand, et al.
Published: (2024)

findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding
by: Martínez, Héctor Javier Vázquez
Published: (2026)

TICL+: A Case Study On Speech In-Context Learning for Children's Speech Recognition
by: Zheng, Haolong, et al.
Published: (2025)

Self-supervised Speech Models for Word-Level Stuttered Speech Detection
by: Shih, Yi-Jen, et al.
Published: (2024)

Unifying Model and Layer Fusion for Speech Foundation Models
by: Shih, Yi-Jen, et al.
Published: (2025)

TICL: Text-Embedding KNN For Speech In-Context Learning Unlocks Speech Recognition Abilities of Large Multimodal Models
by: Zheng, Haolong, et al.
Published: (2025)

ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining
by: Diwan, Anuj, et al.
Published: (2026)

MetaSICL: Adapting Audiroty LLM via Meta Speech In-Context Learning
by: Zheng, Haolong, et al.
Published: (2026)

Scaling Rich Style-Prompted Text-to-Speech Datasets
by: Diwan, Anuj, et al.
Published: (2025)

UITron-Speech: Towards Automated GUI Agents Based on Speech Instructions
by: Han, Wenkang, et al.
Published: (2025)

Song Form-aware Full-Song Text-to-Lyrics Generation with Multi-Level Granularity Syllable Count Control
by: Chae, Yunkee, et al.
Published: (2024)

Beyond Single-Shot: Multi-step Tool Retrieval via Query Planning
by: Fang, Wei, et al.
Published: (2026)

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
by: Peng, Puyuan, et al.
Published: (2024)

Towards Robust Speech Recognition for Jamaican Patois Music Transcription
by: Madden, Jordan, et al.
Published: (2025)

TiCo: Time-Controllable Spoken Dialogue Model
by: Chang, Kai-Wei, et al.
Published: (2026)

SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
by: Zhang, Xueyao, et al.
Published: (2025)

Syllable-level lyrics generation from melody exploiting character-level language model
by: Zhang, Zhe, et al.
Published: (2023)

ConfPO: Exploiting Policy Model Confidence for Critical Token Selection in Preference Optimization
by: Yoon, Hee Suk, et al.
Published: (2025)

Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition
by: Wang, Peng, et al.
Published: (2026)

Incorporating Error Level Noise Embedding for Improving LLM-Assisted Robustness in Persian Speech Recognition
by: Rahmani, Zahra, et al.
Published: (2025)

Error-preserving Automatic Speech Recognition of Young English Learners' Language
by: Michot, Janick, et al.
Published: (2024)

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech
by: Bae, Jaesung, et al.
Published: (2026)

ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models
by: Qian, Kaizhi, et al.
Published: (2025)

Killkan: The Automatic Speech Recognition Dataset for Kichwa with Morphosyntactic Information
by: Taguchi, Chihiro, et al.
Published: (2024)

Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks
by: Hsu, Ming-Hao, et al.
Published: (2023)

Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis
by: Lan, Yu-Siang, et al.
Published: (2026)

PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play
by: Fang, Wei, et al.
Published: (2025)

Automatic Speech Recognition for Documenting Endangered Languages: Case Study of Ikema Miyakoan
by: Taguchi, Chihiro, et al.
Published: (2026)

Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn't
by: Taguchi, Chihiro, et al.
Published: (2024)

Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
by: Khurana, Sameer, et al.
Published: (2023)

Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
by: Jain, Yash, et al.
Published: (2024)

PACR: Progressively Ascending Confidence Reward for LLM Reasoning
by: Yoon, Eunseop, et al.
Published: (2025)

Improving Speech Recognition Error Prediction for Modern and Off-the-shelf Speech Recognizers
by: Serai, Prashant, et al.
Published: (2024)

ASKD-Whisper: Adaptive Self-knowledge Distillation for Efficient and Low-Latency Automatic Speech Recognition
by: Lee, Junseok, et al.
Published: (2026)

Codec2Vec: Self-Supervised Speech Representation Learning Using Neural Speech Codecs
by: Tseng, Wei-Cheng, et al.
Published: (2025)