:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Luo, Yuxin, Zhang, Ruoyi, Liu, Lu-Chuan, Li, Tianyu, Liu, Hangyu
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Sound Computation and Language
Accesso online:	https://arxiv.org/abs/2509.15140
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Deep Supervised Contrastive Learning of Pitch Contours for Robust Pitch Accent Classification in Seoul Korean
di: Joo, Hyunjung, et al.
Pubblicazione: (2026)

MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling
di: Cheng, Yifan, et al.
Pubblicazione: (2025)

Fish Audio S2 Technical Report
di: Liao, Shijia, et al.
Pubblicazione: (2026)

Incremental FastPitch: Chunk-based High Quality Text to Speech
di: Du, Muyang, et al.
Pubblicazione: (2024)

LightCAM: A Fast and Light Implementation of Context-Aware Masking based D-TDNN for Speaker Verification
di: Cao, Di, et al.
Pubblicazione: (2024)

Pitch Accent Detection improves Pretrained Automatic Speech Recognition
di: Sasu, David, et al.
Pubblicazione: (2025)

Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis
di: Wang, Xintong, et al.
Pubblicazione: (2024)

HeadRouter: Dynamic Head-Weight Routing for Task-Adaptive Audio Token Pruning in Large Audio Language Models
di: He, Peize, et al.
Pubblicazione: (2026)

Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
di: Park, Chanho, et al.
Pubblicazione: (2023)

On the Cost and Benefits of Training Context with Utterance or Full Conversation Training: A Comparative Stud
di: Liu, Hyouin, et al.
Pubblicazione: (2025)

Intra- and Inter-modal Context Interaction Modeling for Conversational Speech Synthesis
di: Jia, Zhenqi, et al.
Pubblicazione: (2024)

Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling
di: Liu, Rui, et al.
Pubblicazione: (2024)

Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT
di: Yamauchi, Kazuki, et al.
Pubblicazione: (2024)

BRACE: A Benchmark for Robust Audio Caption Quality Evaluation
di: Guo, Tianyu, et al.
Pubblicazione: (2025)

Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
di: Ao, Junyi, et al.
Pubblicazione: (2025)

Closing the Modality Reasoning Gap for Speech Large Language Models
di: Wang, Chaoren, et al.
Pubblicazione: (2026)

Label-Context-Dependent Internal Language Model Estimation for CTC
di: Yang, Zijian, et al.
Pubblicazione: (2025)

End-to-end Contrastive Language-Speech Pretraining Model For Long-form Spoken Question Answering
di: Hu, Jiliang, et al.
Pubblicazione: (2025)

Foundation Model-based Evaluation of Neuropsychiatric Disorders: A Lifespan-Inclusive, Multi-Modal, and Multi-Lingual Study
di: Dong, Zhongren, et al.
Pubblicazione: (2025)

PROFASR-BENCH: A Benchmark for Context-Conditioned ASR in High-Stakes Professional Speech
di: Piskala, Deepak Babu
Pubblicazione: (2025)

Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
di: Xue, Jinlong, et al.
Pubblicazione: (2024)

Zipper-LoRA: Dynamic Parameter Decoupling for Speech-LLM based Multilingual Speech Recognition
di: Mei, Yuxiang, et al.
Pubblicazione: (2026)

Audio-DeepThinker: Progressive Reasoning-Aware Reinforcement Learning for High-Quality Chain-of-Thought Emergence in Audio Language Models
di: He, Xiang, et al.
Pubblicazione: (2026)

SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation
di: Lu, Haitian, et al.
Pubblicazione: (2025)

Comparison of sEMG Encoding Accuracy Across Speech Modes Using Articulatory and Phoneme Features
di: Le, Chenqian, et al.
Pubblicazione: (2026)

FluentEditor2: Text-based Speech Editing by Modeling Multi-Scale Acoustic and Prosody Consistency
di: Liu, Rui, et al.
Pubblicazione: (2024)

DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding
di: Zhou, Jiaming, et al.
Pubblicazione: (2026)

Effective Context in Neural Speech Models
di: Meng, Yen, et al.
Pubblicazione: (2025)

Beyond Transcription: Unified Audio Schema for Perception-Aware AudioLLMs
di: Zhang, Linhao, et al.
Pubblicazione: (2026)

Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
di: Andrusenko, Andrei, et al.
Pubblicazione: (2024)

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
di: Song, Yuhan, et al.
Pubblicazione: (2025)

DARS: Dysarthria-Aware Rhythm-Style Synthesis for ASR Enhancement
di: Wu, Minghui, et al.
Pubblicazione: (2026)

VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents
di: Hu, Jiliang, et al.
Pubblicazione: (2025)

A Unified Spoken Language Model with Injected Emotional-Attribution Thinking for Human-like Interaction
di: Wang, Qing, et al.
Pubblicazione: (2026)

Towards Expressive Video Dubbing with Multiscale Multimodal Context Interaction
di: Zhao, Yuan, et al.
Pubblicazione: (2024)

Not in Sync: Unveiling Temporal Bias in Audio Chat Models
di: Yao, Jiayu, et al.
Pubblicazione: (2025)

Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR
di: Lin, Zhennan, et al.
Pubblicazione: (2026)

SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval
di: Lin, Yueqian, et al.
Pubblicazione: (2024)

On the Cross-lingual Transferability of Pre-trained wav2vec2-based Models
di: Grosman, Jonatas, et al.
Pubblicazione: (2025)

VocalNet-MDM: Accelerating Streaming Speech LLM via Self-Distilled Masked Diffusion Modeling
di: Cheng, Ziyang, et al.
Pubblicazione: (2026)