Guardat en:
| Autors principals: | Lu, Ke-Han, Chen, Zhehuai, Fu, Szu-Wei, Yang, Chao-Han Huck, Balam, Jagadeesh, Ginsburg, Boris, Wang, Yu-Chiang Frank, Lee, Hung-yi |
|---|---|
| Format: | Preprint |
| Publicat: |
2024
|
| Matèries: | |
| Accés en línia: | https://arxiv.org/abs/2409.20007 |
| Etiquetes: |
Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!
|
Ítems similars
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
per: Lu, Ke-Han, et al.
Publicat: (2024)
per: Lu, Ke-Han, et al.
Publicat: (2024)
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
per: Lu, Ke-Han, et al.
Publicat: (2025)
per: Lu, Ke-Han, et al.
Publicat: (2025)
SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
per: Hu, Ke, et al.
Publicat: (2025)
per: Hu, Ke, et al.
Publicat: (2025)
Word Level Timestamp Generation for Automatic Speech Recognition and Translation
per: Hu, Ke, et al.
Publicat: (2025)
per: Hu, Ke, et al.
Publicat: (2025)
Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
per: Huang, Sung-Feng, et al.
Publicat: (2025)
per: Huang, Sung-Feng, et al.
Publicat: (2025)
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models
per: Noroozi, Vahid, et al.
Publicat: (2024)
per: Noroozi, Vahid, et al.
Publicat: (2024)
Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
per: Wang, Weiqing, et al.
Publicat: (2024)
per: Wang, Weiqing, et al.
Publicat: (2024)
Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
per: Xu, Hainan, et al.
Publicat: (2024)
per: Xu, Hainan, et al.
Publicat: (2024)
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
per: Huang, He, et al.
Publicat: (2024)
per: Huang, He, et al.
Publicat: (2024)
Flexible Multichannel Speech Enhancement for Noise-Robust Frontend
per: Jukić, Ante, et al.
Publicat: (2024)
per: Jukić, Ante, et al.
Publicat: (2024)
META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR
per: Wang, Jinhan, et al.
Publicat: (2024)
per: Wang, Jinhan, et al.
Publicat: (2024)
EMMeTT: Efficient Multimodal Machine Translation Training
per: Żelasko, Piotr, et al.
Publicat: (2024)
per: Żelasko, Piotr, et al.
Publicat: (2024)
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
per: Burchi, Maxime, et al.
Publicat: (2024)
per: Burchi, Maxime, et al.
Publicat: (2024)
Less is More: Accurate Speech Recognition & Translation without Web-Scale Data
per: Puvvada, Krishna C., et al.
Publicat: (2024)
per: Puvvada, Krishna C., et al.
Publicat: (2024)
An Investigation of Incorporating Mamba for Speech Enhancement
per: Chao, Rong, et al.
Publicat: (2024)
per: Chao, Rong, et al.
Publicat: (2024)
Universal Speech Enhancement with Regression and Generative Mamba
per: Chao, Rong, et al.
Publicat: (2025)
per: Chao, Rong, et al.
Publicat: (2025)
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
per: Huang, Chien-yu, et al.
Publicat: (2023)
per: Huang, Chien-yu, et al.
Publicat: (2023)
Schrödinger Bridge for Generative Speech Enhancement
per: Jukić, Ante, et al.
Publicat: (2024)
per: Jukić, Ante, et al.
Publicat: (2024)
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
per: Chen, Zhehuai, et al.
Publicat: (2024)
per: Chen, Zhehuai, et al.
Publicat: (2024)
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
per: Chen, Chen, et al.
Publicat: (2025)
per: Chen, Chen, et al.
Publicat: (2025)
Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception
per: Wan, Zhen, et al.
Publicat: (2026)
per: Wan, Zhen, et al.
Publicat: (2026)
Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech
per: Fu, Szu-Wei, et al.
Publicat: (2024)
per: Fu, Szu-Wei, et al.
Publicat: (2024)
Streaming Sortformer: Speaker Cache-Based Online Speaker Diarization with Arrival-Time Ordering
per: Medennikov, Ivan, et al.
Publicat: (2025)
per: Medennikov, Ivan, et al.
Publicat: (2025)
Speechless: Speech Instruction Training Without Speech for Low Resource Languages
per: Dao, Alan, et al.
Publicat: (2025)
per: Dao, Alan, et al.
Publicat: (2025)
Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models
per: Lu, Ke-Han, et al.
Publicat: (2025)
per: Lu, Ke-Han, et al.
Publicat: (2025)
Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
per: Lin, Guan-Ting, et al.
Publicat: (2024)
per: Lin, Guan-Ting, et al.
Publicat: (2024)
How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation
per: Lu, Ke-Han, et al.
Publicat: (2026)
per: Lu, Ke-Han, et al.
Publicat: (2026)
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
per: Yang, Chih-Kai, et al.
Publicat: (2025)
per: Yang, Chih-Kai, et al.
Publicat: (2025)
Sortformer: A Novel Approach for Permutation-Resolved Speaker Supervision in Speech-to-Text Systems
per: Park, Taejin, et al.
Publicat: (2024)
per: Park, Taejin, et al.
Publicat: (2024)
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
per: Peng, Yifan, et al.
Publicat: (2024)
per: Peng, Yifan, et al.
Publicat: (2024)
S2S-Arena: Evaluating Paralinguistic Instruction Following in Speech-to-Speech Models
per: Jiang, Feng, et al.
Publicat: (2025)
per: Jiang, Feng, et al.
Publicat: (2025)
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
per: Kuan, Chun-Yi, et al.
Publicat: (2024)
per: Kuan, Chun-Yi, et al.
Publicat: (2024)
MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech
per: Chen, Huakang, et al.
Publicat: (2026)
per: Chen, Huakang, et al.
Publicat: (2026)
SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition
per: Hsu, Ming-Hao, et al.
Publicat: (2024)
per: Hsu, Ming-Hao, et al.
Publicat: (2024)
Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations
per: Feng, Bo-Han, et al.
Publicat: (2025)
per: Feng, Bo-Han, et al.
Publicat: (2025)
Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR
per: Wang, Weiqing, et al.
Publicat: (2025)
per: Wang, Weiqing, et al.
Publicat: (2025)
Parallel Synthesis for Autoregressive Speech Generation
per: Hsu, Po-chun, et al.
Publicat: (2022)
per: Hsu, Po-chun, et al.
Publicat: (2022)
MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation
per: Chen, Szu-Chi, et al.
Publicat: (2026)
per: Chen, Szu-Chi, et al.
Publicat: (2026)
Leveraging Mamba with Full-Face Vision for Audio-Visual Speech Enhancement
per: Chao, Rong, et al.
Publicat: (2025)
per: Chao, Rong, et al.
Publicat: (2025)
Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
per: Ku, Pin-Jui, et al.
Publicat: (2024)
per: Ku, Pin-Jui, et al.
Publicat: (2024)
Ítems similars
-
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
per: Lu, Ke-Han, et al.
Publicat: (2024) -
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
per: Lu, Ke-Han, et al.
Publicat: (2025) -
SALM-Duplex: Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model
per: Hu, Ke, et al.
Publicat: (2025) -
Word Level Timestamp Generation for Automatic Speech Recognition and Translation
per: Hu, Ke, et al.
Publicat: (2025) -
Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
per: Huang, Sung-Feng, et al.
Publicat: (2025)