Saved in:
| Main Authors: | Le, Chenyang, Xia, Yinfeng, Li, Huiyan, Wang, Manhong, Sun, Yutao, Ma, Xingyang, Qian, Yanmin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.11189 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MFLA: Monotonic Finite Look-ahead Attention for Streaming Speech Recognition
by: Xia, Yinfeng, et al.
Published: (2025)
by: Xia, Yinfeng, et al.
Published: (2025)
SimulMEGA: MoE Routers are Advanced Policy Makers for Simultaneous Speech Translation
by: Le, Chenyang, et al.
Published: (2025)
by: Le, Chenyang, et al.
Published: (2025)
SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
by: Huang, Wen, et al.
Published: (2025)
by: Huang, Wen, et al.
Published: (2025)
Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
by: Zhang, Leying, et al.
Published: (2024)
by: Zhang, Leying, et al.
Published: (2024)
DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition
by: Shao, Hang, et al.
Published: (2023)
by: Shao, Hang, et al.
Published: (2023)
Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation
by: Hirschkind, Nameer, et al.
Published: (2024)
by: Hirschkind, Nameer, et al.
Published: (2024)
Advanced Zero-Shot Text-to-Speech for Background Removal and Preservation with Controllable Masked Speech Prediction
by: Zhang, Leying, et al.
Published: (2025)
by: Zhang, Leying, et al.
Published: (2025)
S2ST-Omni: Hierarchical Language-Aware SpeechLLM Adaptation for Multilingual Speech-to-Speech Translation
by: Pan, Yu, et al.
Published: (2025)
by: Pan, Yu, et al.
Published: (2025)
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
by: Le, Chenyang, et al.
Published: (2024)
by: Le, Chenyang, et al.
Published: (2024)
Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
by: Li, Chenda, et al.
Published: (2024)
by: Li, Chenda, et al.
Published: (2024)
Improving Design of Input Condition Invariant Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)
by: Zhang, Wangyou, et al.
Published: (2024)
From Sharpness to Better Generalization for Speech Deepfake Detection
by: Huang, Wen, et al.
Published: (2025)
by: Huang, Wen, et al.
Published: (2025)
BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM
by: Gong, Xun, et al.
Published: (2025)
by: Gong, Xun, et al.
Published: (2025)
Magnitude-Phase Dual-Path Speech Enhancement Network based on Self-Supervised Embedding and Perceptual Contrast Stretch Boosting
by: Mattursun, Alimjan, et al.
Published: (2025)
by: Mattursun, Alimjan, et al.
Published: (2025)
Semantic-Emotional Resonance Embedding: A Semi-Supervised Paradigm for Cross-Lingual Speech Emotion Recognition
by: Zhao, Ya, et al.
Published: (2026)
by: Zhao, Ya, et al.
Published: (2026)
Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
by: Chen, Zhengyang, et al.
Published: (2024)
by: Chen, Zhengyang, et al.
Published: (2024)
PCQ: Emotion Recognition in Speech via Progressive Channel Querying
by: Wang, Xincheng, et al.
Published: (2024)
by: Wang, Xincheng, et al.
Published: (2024)
ECTSpeech: Enhancing Efficient Speech Synthesis via Easy Consistency Tuning
by: Zhu, Tao, et al.
Published: (2025)
by: Zhu, Tao, et al.
Published: (2025)
Less is More: Data Curation Matters in Scaling Speech Enhancement
by: Li, Chenda, et al.
Published: (2025)
by: Li, Chenda, et al.
Published: (2025)
Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment
by: Wang, Wei, et al.
Published: (2025)
by: Wang, Wei, et al.
Published: (2025)
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)
by: Zhang, Wangyou, et al.
Published: (2024)
Efficient Multilingual ASR Finetuning via LoRA Language Experts
by: Li, Jiahong, et al.
Published: (2025)
by: Li, Jiahong, et al.
Published: (2025)
Advanced Long-Content Speech Recognition With Factorized Neural Transducer
by: Gong, Xun, et al.
Published: (2024)
by: Gong, Xun, et al.
Published: (2024)
Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
by: Liu, Bei, et al.
Published: (2024)
by: Liu, Bei, et al.
Published: (2024)
Toward Universal Speech Enhancement for Diverse Input Conditions
by: Zhang, Wangyou, et al.
Published: (2023)
by: Zhang, Wangyou, et al.
Published: (2023)
Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization
by: Liu, Bei, et al.
Published: (2024)
by: Liu, Bei, et al.
Published: (2024)
Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text
by: Xue, Hongfei, et al.
Published: (2024)
by: Xue, Hongfei, et al.
Published: (2024)
DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice
by: Zhang, Leying, et al.
Published: (2026)
by: Zhang, Leying, et al.
Published: (2026)
LLM-ForcedAligner: A Non-Autoregressive and Accurate LLM-Based Forced Aligner for Multilingual and Long-Form Speech
by: Mu, Bingshen, et al.
Published: (2026)
by: Mu, Bingshen, et al.
Published: (2026)
DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module
by: Wang, Xinyu, et al.
Published: (2024)
by: Wang, Xinyu, et al.
Published: (2024)
ICASSP 2026 URGENT Speech Enhancement Challenge
by: Li, Chenda, et al.
Published: (2026)
by: Li, Chenda, et al.
Published: (2026)
MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech
by: Chen, Huakang, et al.
Published: (2026)
by: Chen, Huakang, et al.
Published: (2026)
URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
by: Zhang, Wangyou, et al.
Published: (2024)
by: Zhang, Wangyou, et al.
Published: (2024)
MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention
by: Jiao, Xinxin, et al.
Published: (2024)
by: Jiao, Xinxin, et al.
Published: (2024)
Joint Training And Decoding for Multilingual End-to-End Simultaneous Speech Translation
by: Huang, Wuwei, et al.
Published: (2025)
by: Huang, Wuwei, et al.
Published: (2025)
Layer-wise Analysis for Quality of Multilingual Synthesized Speech
by: Cooper, Erica, et al.
Published: (2025)
by: Cooper, Erica, et al.
Published: (2025)
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
by: Liao, Shijia, et al.
Published: (2024)
by: Liao, Shijia, et al.
Published: (2024)
AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis
by: Cao, Yubing, et al.
Published: (2025)
by: Cao, Yubing, et al.
Published: (2025)
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
by: Fu, Yonggan, et al.
Published: (2022)
by: Fu, Yonggan, et al.
Published: (2022)
Text adaptation for speaker verification with speaker-text factorized embeddings
by: Yang, Yexin, et al.
Published: (2025)
by: Yang, Yexin, et al.
Published: (2025)
Similar Items
-
MFLA: Monotonic Finite Look-ahead Attention for Streaming Speech Recognition
by: Xia, Yinfeng, et al.
Published: (2025) -
SimulMEGA: MoE Routers are Advanced Policy Makers for Simultaneous Speech Translation
by: Le, Chenyang, et al.
Published: (2025) -
SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
by: Huang, Wen, et al.
Published: (2025) -
Scale This, Not That: Investigating Key Dataset Attributes for Efficient Speech Enhancement Scaling
by: Zhang, Leying, et al.
Published: (2024) -
DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition
by: Shao, Hang, et al.
Published: (2023)