:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Udandarao, Vishaal, Lu, Zhiyun, Chang, Xuankai, Wang, Yongqiang, Yao, Violet Z., Jose, Albin Madapally, Faghri, Fartash, Gardner, Josh, Chiu, Chung-Cheng
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Audio and Speech Processing Computation and Language Machine Learning
Online-Zugang:	https://arxiv.org/abs/2510.20860
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics
von: Arora, Siddhant, et al.
Veröffentlicht: (2025)

LV-CTC: Non-autoregressive ASR with CTC and latent variable models
von: Fujita, Yuya, et al.
Veröffentlicht: (2024)

Improving Automatic Speech Recognition with Decoder-Centric Regularisation in Encoder-Decoder Models
von: Polok, Alexander, et al.
Veröffentlicht: (2024)

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
von: Chang, Xuankai, et al.
Veröffentlicht: (2024)

Leveraging Self-Supervised Audio-Visual Pretrained Models to Improve Vocoded Speech Intelligibility in Cochlear Implant Simulation
von: Lai, Richard Lee, et al.
Veröffentlicht: (2023)

The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
von: Cornell, Samuele, et al.
Veröffentlicht: (2024)

SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
von: Guo, Pengcheng, et al.
Veröffentlicht: (2024)

Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
von: Wu, Shih-Lun, et al.
Veröffentlicht: (2023)

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
von: Shi, Jiatong, et al.
Veröffentlicht: (2023)

Benchmarking Large Pretrained Multilingual Models on Québec French Speech Recognition
von: Serrand, Coralie, et al.
Veröffentlicht: (2025)

Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
von: Ku, Pin-Jui, et al.
Veröffentlicht: (2024)

Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
von: Wu, Yihan, et al.
Veröffentlicht: (2024)

Code-switching Speech Recognition Under the Lens: Model- and Data-Centric Perspectives
von: Liu, Hexin, et al.
Veröffentlicht: (2025)

DeCRED: Decoder-Centric Regularization for Encoder-Decoder Based Speech Recognition
von: Polok, Alexander, et al.
Veröffentlicht: (2025)

SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data
von: Lu, Yichen, et al.
Veröffentlicht: (2024)

Emotion-Coherent Speech Data Augmentation and Self-Supervised Contrastive Style Training for Enhancing Kids's Story Speech Synthesis
von: Chung, Raymond
Veröffentlicht: (2026)

Improving Pretrained YAMNet for Enhanced Speech Command Detection via Transfer Learning
von: Lachenani, Sidahmed, et al.
Veröffentlicht: (2025)

CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
von: Kim, Jaehyeon, et al.
Veröffentlicht: (2024)

Active Learning of Non-semantic Speech Tasks with Pretrained Models
von: Lee, Harlin, et al.
Veröffentlicht: (2022)

Improving Query-by-Vocal Imitation with Contrastive Learning and Audio Pretraining
von: Greif, Jonathan, et al.
Veröffentlicht: (2024)

Lessons Learnt: Revisit Key Training Strategies for Effective Speech Emotion Recognition in the Wild
von: Tzeng, Jing-Tong, et al.
Veröffentlicht: (2025)

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
von: Peng, Yifan, et al.
Veröffentlicht: (2024)

Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition
von: Yao, Wenhan, et al.
Veröffentlicht: (2024)

Egonoise Resilient Source Localization and Speech Enhancement for Drones Using a Hybrid Model and Learning-Based Approach
von: Wu, Yihsuan, et al.
Veröffentlicht: (2025)

Speech Quality Embeddings for Improved Detection and Classification of Degradations in Speech Signals
von: Kuhlmann, Michael, et al.
Veröffentlicht: (2026)

Towards Robust Speech Representation Learning for Thousands of Languages
von: Chen, William, et al.
Veröffentlicht: (2024)

Preserving Speaker Information in Direct Speech-to-Speech Translation with Non-Autoregressive Generation and Pretraining
von: Zhou, Rui, et al.
Veröffentlicht: (2024)

Reducing the Gap Between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging Module
von: Cui, Zhongjian, et al.
Veröffentlicht: (2025)

Speech Denoising with Auditory Models
von: Saddler, Mark R., et al.
Veröffentlicht: (2020)

Pretraining Large Brain Language Model for Active BCI: Silent Speech
von: Zhou, Jinzhao, et al.
Veröffentlicht: (2025)

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
von: Maiti, Soumi, et al.
Veröffentlicht: (2023)

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
von: Shi, Jiatong, et al.
Veröffentlicht: (2024)

Compact Speech Translation Models via Discrete Speech Units Pretraining
von: Lam, Tsz Kin, et al.
Veröffentlicht: (2024)

Interpreting Pretrained Speech Models for Automatic Speech Assessment of Voice Disorders
von: Lau, Hok-Shing, et al.
Veröffentlicht: (2024)

LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling
von: Kwak, Doyeop, et al.
Veröffentlicht: (2025)

Improving Controllability and Editability for Pretrained Text-to-Music Generation Models
von: Zhang, Yixiao
Veröffentlicht: (2024)

Recent Trends in Distant Conversational Speech Recognition: A Review of CHiME-7 and 8 DASR Challenges
von: Cornell, Samuele, et al.
Veröffentlicht: (2025)

Rethinking Continual Learning for Speech and Audio: A Representation-Centric Taxonomy and Open Problems
von: Xiao, Yang, et al.
Veröffentlicht: (2026)

Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech
von: Yao, Jixun, et al.
Veröffentlicht: (2025)

A Novel Numerical Method for Relaxing the Minimal Configurations of TOA-Based Joint Sensors and Sources Localization
von: Cao, Faxian, et al.
Veröffentlicht: (2024)