Saved in:
| Main Authors: | Oh, Ro-hoon, Seol, Jihwan, Kim, Bugeun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.14803 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VoiceBBQ: Investigating Effect of Content and Acoustics in Social Bias of Spoken Language Model
by: Choi, Junhyuk, et al.
Published: (2025)
by: Choi, Junhyuk, et al.
Published: (2025)
Raon-Speech Technical Report
by: Kim, Beomsoo, et al.
Published: (2026)
by: Kim, Beomsoo, et al.
Published: (2026)
On the Relationship between Accent Strength and Articulatory Features
by: Huang, Kevin, et al.
Published: (2025)
by: Huang, Kevin, et al.
Published: (2025)
NeuroVoz: a Castillian Spanish corpus of parkinsonian speech
by: Mendes-Laureano, Janaína, et al.
Published: (2024)
by: Mendes-Laureano, Janaína, et al.
Published: (2024)
Forensic deepfake audio detection using segmental speech features
by: Yang, Tianle, et al.
Published: (2025)
by: Yang, Tianle, et al.
Published: (2025)
Improving endpoint detection in end-to-end streaming ASR for conversational speech
by: C, Anandh, et al.
Published: (2025)
by: C, Anandh, et al.
Published: (2025)
A unified front-end framework for English text-to-speech synthesis
by: Ying, Zelin, et al.
Published: (2023)
by: Ying, Zelin, et al.
Published: (2023)
InstructAudio: Unified speech and music generation with natural language instruction
by: Qiang, Chunyu, et al.
Published: (2025)
by: Qiang, Chunyu, et al.
Published: (2025)
Data-efficient Targeted Token-level Preference Optimization for LLM-based Text-to-Speech
by: Kotoge, Rikuto, et al.
Published: (2025)
by: Kotoge, Rikuto, et al.
Published: (2025)
Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks
by: Beguš, Gašper, et al.
Published: (2023)
by: Beguš, Gašper, et al.
Published: (2023)
Acoustic and perceptual differences between standard and accented speech and their voice clones
by: Yang, Tianle, et al.
Published: (2026)
by: Yang, Tianle, et al.
Published: (2026)
XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition
by: Wan, Xucheng, et al.
Published: (2024)
by: Wan, Xucheng, et al.
Published: (2024)
Low-resource speech recognition and dialect identification of Irish in a multi-task framework
by: Lonergan, Liam, et al.
Published: (2024)
by: Lonergan, Liam, et al.
Published: (2024)
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
by: Wang, Xiong, et al.
Published: (2024)
by: Wang, Xiong, et al.
Published: (2024)
H-PRM: A Pluggable Hotword Pre-Retrieval Module for Various Speech Recognition Systems
by: Dai, Huangyu, et al.
Published: (2025)
by: Dai, Huangyu, et al.
Published: (2025)
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
by: Kim, Minsu, et al.
Published: (2023)
by: Kim, Minsu, et al.
Published: (2023)
What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training
by: Kloots, Marianne de Heer, et al.
Published: (2025)
by: Kloots, Marianne de Heer, et al.
Published: (2025)
ASKD-Whisper: Adaptive Self-knowledge Distillation for Efficient and Low-Latency Automatic Speech Recognition
by: Lee, Junseok, et al.
Published: (2026)
by: Lee, Junseok, et al.
Published: (2026)
SAGE-LD: Towards Scalable and Generalizable End-to-End Language Diarization via Simulated Data Augmentation
by: Lee, Sangmin, et al.
Published: (2025)
by: Lee, Sangmin, et al.
Published: (2025)
Still Between Us? Evaluating and Improving Voice Assistant Robustness to Third-Party Interruptions
by: Lee, Dongwook, et al.
Published: (2026)
by: Lee, Dongwook, et al.
Published: (2026)
Moshi: a speech-text foundation model for real-time dialogue
by: Défossez, Alexandre, et al.
Published: (2024)
by: Défossez, Alexandre, et al.
Published: (2024)
Developing multilingual speech synthesis system for Ojibwe, Mi'kmaq, and Maliseet
by: Wang, Shenran, et al.
Published: (2025)
by: Wang, Shenran, et al.
Published: (2025)
DART: An AIGT Detector using AMR of Rephrased Text
by: Park, Hyeonchu, et al.
Published: (2024)
by: Park, Hyeonchu, et al.
Published: (2024)
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks
by: Inoue, Nakamasa, et al.
Published: (2024)
by: Inoue, Nakamasa, et al.
Published: (2024)
Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
by: Papi, Sara, et al.
Published: (2025)
by: Papi, Sara, et al.
Published: (2025)
Fine-Grained and Thematic Evaluation of LLMs in Social Deduction Game
by: Kim, Byungjun, et al.
Published: (2024)
by: Kim, Byungjun, et al.
Published: (2024)
Leveraging Large Language Models for Active Merchant Non-player Characters
by: Kim, Byungjun, et al.
Published: (2024)
by: Kim, Byungjun, et al.
Published: (2024)
Abusive music and song transformation using GenAI and LLMs
by: Choi, Jiyang, et al.
Published: (2026)
by: Choi, Jiyang, et al.
Published: (2026)
WESR: Scaling and Evaluating Word-level Event-Speech Recognition
by: Yang, Chenchen, et al.
Published: (2026)
by: Yang, Chenchen, et al.
Published: (2026)
A novel LSTM music generator based on the fractional time-frequency feature extraction
by: Ya, Li, et al.
Published: (2026)
by: Ya, Li, et al.
Published: (2026)
Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition
by: Wang, Peng, et al.
Published: (2026)
by: Wang, Peng, et al.
Published: (2026)
Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition
by: Ginjala, Srishti, et al.
Published: (2026)
by: Ginjala, Srishti, et al.
Published: (2026)
MOSS-VoiceGenerator: Create Realistic Voices with Natural Language Descriptions
by: Huang, Kexin, et al.
Published: (2026)
by: Huang, Kexin, et al.
Published: (2026)
MedMosaic: A Challenging Large Scale Benchmark of Diverse Medical Audio
by: Rajgarhia, Harshit, et al.
Published: (2026)
by: Rajgarhia, Harshit, et al.
Published: (2026)
Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation
by: Feng, Bo-Han, et al.
Published: (2026)
by: Feng, Bo-Han, et al.
Published: (2026)
Prosody-Guided Harmonic Attention for Phase-Coherent Neural Vocoding in the Complex Spectrum
by: Al-Radhi, Mohammed Salah, et al.
Published: (2026)
by: Al-Radhi, Mohammed Salah, et al.
Published: (2026)
Language Family Matters: Evaluating LLM-Based ASR Across Linguistic Boundaries
by: Zhang, Yuchen, et al.
Published: (2026)
by: Zhang, Yuchen, et al.
Published: (2026)
Putting HUMANS first: Efficient LAM Evaluation with Human Preference Alignment
by: Gan, Woody Haosheng, et al.
Published: (2026)
by: Gan, Woody Haosheng, et al.
Published: (2026)
Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models
by: Riera, Pablo, et al.
Published: (2026)
by: Riera, Pablo, et al.
Published: (2026)
Contextual Earnings-22: A Speech Recognition Benchmark with Custom Vocabulary in the Wild
by: Durmus, Berkin, et al.
Published: (2026)
by: Durmus, Berkin, et al.
Published: (2026)
Similar Items
-
VoiceBBQ: Investigating Effect of Content and Acoustics in Social Bias of Spoken Language Model
by: Choi, Junhyuk, et al.
Published: (2025) -
Raon-Speech Technical Report
by: Kim, Beomsoo, et al.
Published: (2026) -
On the Relationship between Accent Strength and Articulatory Features
by: Huang, Kevin, et al.
Published: (2025) -
NeuroVoz: a Castillian Spanish corpus of parkinsonian speech
by: Mendes-Laureano, Janaína, et al.
Published: (2024) -
Forensic deepfake audio detection using segmental speech features
by: Yang, Tianle, et al.
Published: (2025)