Saved in:
| Main Authors: | Sun, Chunyu, Liu, Bingyu, Cui, Zhichao, Shi, Junhan, Qi, Anbin, Zhang, Tian-hao, Zhou, Dinghao, Lu, Lewei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.02603 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval
by: Sun, Haoqin, et al.
Published: (2025)
by: Sun, Haoqin, et al.
Published: (2025)
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
by: Zhang, Tian-Hao, et al.
Published: (2023)
by: Zhang, Tian-Hao, et al.
Published: (2023)
Retrieval-Augmented Speech Recognition Approach for Domain Challenges
by: Shen, Peng, et al.
Published: (2025)
by: Shen, Peng, et al.
Published: (2025)
Retrieval Augmented Correction of Named Entity Speech Recognition Errors
by: Pusateri, Ernest, et al.
Published: (2024)
by: Pusateri, Ernest, et al.
Published: (2024)
Enhancing Target Speaker Extraction with Explicit Speaker Consistency Modeling
by: Wu, Shu, et al.
Published: (2025)
by: Wu, Shu, et al.
Published: (2025)
RepCodec: A Speech Representation Codec for Speech Tokenization
by: Huang, Zhichao, et al.
Published: (2023)
by: Huang, Zhichao, et al.
Published: (2023)
Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining
by: Xue, Jinlong, et al.
Published: (2024)
by: Xue, Jinlong, et al.
Published: (2024)
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
by: Fu, Ruibo, et al.
Published: (2024)
by: Fu, Ruibo, et al.
Published: (2024)
Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora
by: Nespoli, Francesco, et al.
Published: (2024)
by: Nespoli, Francesco, et al.
Published: (2024)
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
by: Liao, Shijia, et al.
Published: (2024)
by: Liao, Shijia, et al.
Published: (2024)
DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models
by: Wang, Yuanyuan, et al.
Published: (2025)
by: Wang, Yuanyuan, et al.
Published: (2025)
Reducing the Gap Between Pretrained Speech Enhancement and Recognition Models Using a Real Speech-Trained Bridging Module
by: Cui, Zhongjian, et al.
Published: (2025)
by: Cui, Zhongjian, et al.
Published: (2025)
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
by: Wang, Siyin, et al.
Published: (2024)
by: Wang, Siyin, et al.
Published: (2024)
Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis
by: Leung, Wing-Zin, et al.
Published: (2024)
by: Leung, Wing-Zin, et al.
Published: (2024)
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
by: Deng, Yimin, et al.
Published: (2024)
by: Deng, Yimin, et al.
Published: (2024)
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
by: Zhang, Xin, et al.
Published: (2023)
by: Zhang, Xin, et al.
Published: (2023)
Unified Architecture and Unsupervised Speech Disentanglement for Speaker Embedding-Free Enrollment in Personalized Speech Enhancement
by: Huang, Ziling, et al.
Published: (2025)
by: Huang, Ziling, et al.
Published: (2025)
Coarse-to-fine Alignment Makes Better Speech-image Retrieval
by: Zhou, Lifeng, et al.
Published: (2024)
by: Zhou, Lifeng, et al.
Published: (2024)
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model
by: Wang, Siyang, et al.
Published: (2024)
by: Wang, Siyang, et al.
Published: (2024)
EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
by: Yang, Yiqing, et al.
Published: (2025)
by: Yang, Yiqing, et al.
Published: (2025)
Creating Personalized Synthetic Voices from Articulation Impaired Speech Using Augmented Reconstruction Loss
by: Tian, Yusheng, et al.
Published: (2024)
by: Tian, Yusheng, et al.
Published: (2024)
NAST: Noise Aware Speech Tokenization for Speech Language Models
by: Messica, Shoval, et al.
Published: (2024)
by: Messica, Shoval, et al.
Published: (2024)
Attention-weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied to Speech Emotion Recognition
by: Yang, Qingran, et al.
Published: (2026)
by: Yang, Qingran, et al.
Published: (2026)
Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders
by: Sun, Xingwei, et al.
Published: (2025)
by: Sun, Xingwei, et al.
Published: (2025)
Zero-Shot Recognition of Dysarthric Speech Using Commercial Automatic Speech Recognition and Multimodal Large Language Models
by: Alsayegh, Ali, et al.
Published: (2025)
by: Alsayegh, Ali, et al.
Published: (2025)
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
by: Wang, Shih-heng, et al.
Published: (2024)
by: Wang, Shih-heng, et al.
Published: (2024)
Zero-Shot Crate Digging: DJ Tool Retrieval Using Speech Activity, Music Structure And CLAP Embeddings
by: Orife, Iroro
Published: (2024)
by: Orife, Iroro
Published: (2024)
Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders
by: Shan, Weiqiao, et al.
Published: (2025)
by: Shan, Weiqiao, et al.
Published: (2025)
DENSE: Dynamic Embedding Causal Target Speech Extraction
by: Wang, Yiwen, et al.
Published: (2024)
by: Wang, Yiwen, et al.
Published: (2024)
S2ST-Omni: Hierarchical Language-Aware SpeechLLM Adaptation for Multilingual Speech-to-Speech Translation
by: Pan, Yu, et al.
Published: (2025)
by: Pan, Yu, et al.
Published: (2025)
SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
by: Bai, Ye, et al.
Published: (2024)
by: Bai, Ye, et al.
Published: (2024)
ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts
by: Garg, Ashi, et al.
Published: (2025)
by: Garg, Ashi, et al.
Published: (2025)
Unsupervised Single-Channel Speech Separation with a Diffusion Prior under Speaker-Embedding Guidance
by: Shi, Runwu, et al.
Published: (2025)
by: Shi, Runwu, et al.
Published: (2025)
LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models
by: Zhao, Xiaohan, et al.
Published: (2025)
by: Zhao, Xiaohan, et al.
Published: (2025)
FLASepformer: Efficient Speech Separation with Gated Focused Linear Attention Transformer
by: Wang, Haoxu, et al.
Published: (2025)
by: Wang, Haoxu, et al.
Published: (2025)
Speech-DRAME: A Framework for Human-Aligned Benchmarks in Speech Role-Play
by: Shi, Jiatong, et al.
Published: (2025)
by: Shi, Jiatong, et al.
Published: (2025)
Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network
by: Chen, Yanan, et al.
Published: (2024)
by: Chen, Yanan, et al.
Published: (2024)
An Explainable Probabilistic Attribute Embedding Approach for Spoofed Speech Characterization
by: Chhibber, Manasi, et al.
Published: (2024)
by: Chhibber, Manasi, et al.
Published: (2024)
ESPnet-SpeechLM: An Open Speech Language Model Toolkit
by: Tian, Jinchuan, et al.
Published: (2025)
by: Tian, Jinchuan, et al.
Published: (2025)
Similar Items
-
RA-CLAP: Relation-Augmented Emotional Speaking Style Contrastive Language-Audio Pretraining For Speech Retrieval
by: Sun, Haoqin, et al.
Published: (2025) -
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
by: Zhang, Tian-Hao, et al.
Published: (2023) -
Retrieval-Augmented Speech Recognition Approach for Domain Challenges
by: Shen, Peng, et al.
Published: (2025) -
Retrieval Augmented Correction of Named Entity Speech Recognition Errors
by: Pusateri, Ernest, et al.
Published: (2024) -
Enhancing Target Speaker Extraction with Explicit Speaker Consistency Modeling
by: Wu, Shu, et al.
Published: (2025)