:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Oh, Ro-hoon, Seol, Jihwan, Kim, Bugeun
Format:	Preprint
Published:	2026
Subjects:	Sound Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2603.14803
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VoiceBBQ: Investigating Effect of Content and Acoustics in Social Bias of Spoken Language Model
by: Choi, Junhyuk, et al.
Published: (2025)

Raon-Speech Technical Report
by: Kim, Beomsoo, et al.
Published: (2026)

On the Relationship between Accent Strength and Articulatory Features
by: Huang, Kevin, et al.
Published: (2025)

NeuroVoz: a Castillian Spanish corpus of parkinsonian speech
by: Mendes-Laureano, Janaína, et al.
Published: (2024)

Forensic deepfake audio detection using segmental speech features
by: Yang, Tianle, et al.
Published: (2025)

Improving endpoint detection in end-to-end streaming ASR for conversational speech
by: C, Anandh, et al.
Published: (2025)

A unified front-end framework for English text-to-speech synthesis
by: Ying, Zelin, et al.
Published: (2023)

InstructAudio: Unified speech and music generation with natural language instruction
by: Qiang, Chunyu, et al.
Published: (2025)

Data-efficient Targeted Token-level Preference Optimization for LLM-based Text-to-Speech
by: Kotoge, Rikuto, et al.
Published: (2025)

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks
by: Beguš, Gašper, et al.
Published: (2023)

Acoustic and perceptual differences between standard and accented speech and their voice clones
by: Yang, Tianle, et al.
Published: (2026)

XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition
by: Wan, Xucheng, et al.
Published: (2024)

Low-resource speech recognition and dialect identification of Irish in a multi-task framework
by: Lonergan, Liam, et al.
Published: (2024)

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
by: Wang, Xiong, et al.
Published: (2024)

H-PRM: A Pluggable Hotword Pre-Retrieval Module for Various Speech Recognition Systems
by: Dai, Huangyu, et al.
Published: (2025)

Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
by: Kim, Minsu, et al.
Published: (2023)

What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training
by: Kloots, Marianne de Heer, et al.
Published: (2025)

ASKD-Whisper: Adaptive Self-knowledge Distillation for Efficient and Low-Latency Automatic Speech Recognition
by: Lee, Junseok, et al.
Published: (2026)

SAGE-LD: Towards Scalable and Generalizable End-to-End Language Diarization via Simulated Data Augmentation
by: Lee, Sangmin, et al.
Published: (2025)

Still Between Us? Evaluating and Improving Voice Assistant Robustness to Third-Party Interruptions
by: Lee, Dongwook, et al.
Published: (2026)

Moshi: a speech-text foundation model for real-time dialogue
by: Défossez, Alexandre, et al.
Published: (2024)

Developing multilingual speech synthesis system for Ojibwe, Mi'kmaq, and Maliseet
by: Wang, Shenran, et al.
Published: (2025)

DART: An AIGT Detector using AMR of Rephrased Text
by: Park, Hyeonchu, et al.
Published: (2024)

ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks
by: Inoue, Nakamasa, et al.
Published: (2024)

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
by: Papi, Sara, et al.
Published: (2025)

Fine-Grained and Thematic Evaluation of LLMs in Social Deduction Game
by: Kim, Byungjun, et al.
Published: (2024)

Leveraging Large Language Models for Active Merchant Non-player Characters
by: Kim, Byungjun, et al.
Published: (2024)

Abusive music and song transformation using GenAI and LLMs
by: Choi, Jiyang, et al.
Published: (2026)

WESR: Scaling and Evaluating Word-level Event-Speech Recognition
by: Yang, Chenchen, et al.
Published: (2026)

A novel LSTM music generator based on the fractional time-frequency feature extraction
by: Ya, Li, et al.
Published: (2026)

Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition
by: Wang, Peng, et al.
Published: (2026)

Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition
by: Ginjala, Srishti, et al.
Published: (2026)

MOSS-VoiceGenerator: Create Realistic Voices with Natural Language Descriptions
by: Huang, Kexin, et al.
Published: (2026)

MedMosaic: A Challenging Large Scale Benchmark of Diverse Medical Audio
by: Rajgarhia, Harshit, et al.
Published: (2026)

Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation
by: Feng, Bo-Han, et al.
Published: (2026)

Prosody-Guided Harmonic Attention for Phase-Coherent Neural Vocoding in the Complex Spectrum
by: Al-Radhi, Mohammed Salah, et al.
Published: (2026)

Language Family Matters: Evaluating LLM-Based ASR Across Linguistic Boundaries
by: Zhang, Yuchen, et al.
Published: (2026)

Putting HUMANS first: Efficient LAM Evaluation with Human Preference Alignment
by: Gan, Woody Haosheng, et al.
Published: (2026)

Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models
by: Riera, Pablo, et al.
Published: (2026)

Contextual Earnings-22: A Speech Recognition Benchmark with Custom Vocabulary in the Wild
by: Durmus, Berkin, et al.
Published: (2026)