Saved in:
| Main Authors: | Wang, He, Guo, Pengcheng, Chen, Wei, Zhou, Pan, Xie, Lei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.06788 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The TEA-ASLP System for Multilingual Conversational Speech Recognition and Speech Diarization in MLC-SLM 2025 Challenge
by: Xue, Hongfei, et al.
Published: (2025)
by: Xue, Hongfei, et al.
Published: (2025)
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition
by: Wang, He, et al.
Published: (2024)
by: Wang, He, et al.
Published: (2024)
CNVSRC 2024: The Second Chinese Continuous Visual Speech Recognition Challenge
by: Liu, Zehua, et al.
Published: (2025)
by: Liu, Zehua, et al.
Published: (2025)
Mixture of LoRA Experts with Multi-Modal and Multi-Granularity LLM Generative Error Correction for Accented Speech Recognition
by: Mu, Bingshen, et al.
Published: (2025)
by: Mu, Bingshen, et al.
Published: (2025)
Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition
by: Xie, Xurong, et al.
Published: (2022)
by: Xie, Xurong, et al.
Published: (2022)
The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge
by: Guo, Dake, et al.
Published: (2024)
by: Guo, Dake, et al.
Published: (2024)
AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition
by: Dai, Yuhang, et al.
Published: (2025)
by: Dai, Yuhang, et al.
Published: (2025)
ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge
by: Wang, He, et al.
Published: (2024)
by: Wang, He, et al.
Published: (2024)
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
by: Chen, Peikun, et al.
Published: (2024)
by: Chen, Peikun, et al.
Published: (2024)
Seeing the Context: Rich Visual Context-Aware Speech Recognition via Multimodal Reasoning
by: Tian, Wenjie, et al.
Published: (2026)
by: Tian, Wenjie, et al.
Published: (2026)
CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition
by: Wang, He, et al.
Published: (2024)
by: Wang, He, et al.
Published: (2024)
An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge
by: Han, Runduo, et al.
Published: (2024)
by: Han, Runduo, et al.
Published: (2024)
MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
by: Mu, Bingshen, et al.
Published: (2024)
by: Mu, Bingshen, et al.
Published: (2024)
Uncovering the Visual Contribution in Audio-Visual Speech Recognition
by: Lin, Zhaofeng, et al.
Published: (2024)
by: Lin, Zhaofeng, et al.
Published: (2024)
A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data
by: Tran, Minh, et al.
Published: (2025)
by: Tran, Minh, et al.
Published: (2025)
EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition
by: Li, Pengcheng, et al.
Published: (2025)
by: Li, Pengcheng, et al.
Published: (2025)
StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding
by: Guo, Dake, et al.
Published: (2025)
by: Guo, Dake, et al.
Published: (2025)
The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge
by: Tian, Jingguang, et al.
Published: (2024)
by: Tian, Jingguang, et al.
Published: (2024)
Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
by: Guo, Haohan, et al.
Published: (2024)
by: Guo, Haohan, et al.
Published: (2024)
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
by: Guo, Pengcheng, et al.
Published: (2024)
by: Guo, Pengcheng, et al.
Published: (2024)
Selective Invocation for Multilingual ASR: A Cost-effective Approach Adapting to Speech Recognition Difficulty
by: Xue, Hongfei, et al.
Published: (2025)
by: Xue, Hongfei, et al.
Published: (2025)
ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
by: Wang, He, et al.
Published: (2025)
by: Wang, He, et al.
Published: (2025)
dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition
by: Tian, Wenjie, et al.
Published: (2026)
by: Tian, Wenjie, et al.
Published: (2026)
Human-Inspired Computing for Robust and Efficient Audio-Visual Speech Recognition
by: Liu, Qianhui, et al.
Published: (2024)
by: Liu, Qianhui, et al.
Published: (2024)
A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition
by: Li, Yangze, et al.
Published: (2024)
by: Li, Yangze, et al.
Published: (2024)
Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
by: Xue, Hongfei, et al.
Published: (2024)
by: Xue, Hongfei, et al.
Published: (2024)
From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition
by: Huang, Mengcheng, et al.
Published: (2026)
by: Huang, Mengcheng, et al.
Published: (2026)
Summary on The Multilingual Conversational Speech Language Model Challenge: Datasets, Tasks, Baselines, and Methods
by: Mu, Bingshen, et al.
Published: (2025)
by: Mu, Bingshen, et al.
Published: (2025)
Efficient Speech Watermarking for Speech Synthesis via Progressive Knowledge Distillation
by: Cui, Yang, et al.
Published: (2025)
by: Cui, Yang, et al.
Published: (2025)
Advanced Long-Content Speech Recognition With Factorized Neural Transducer
by: Gong, Xun, et al.
Published: (2024)
by: Gong, Xun, et al.
Published: (2024)
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
by: Liu, Huadai, et al.
Published: (2023)
by: Liu, Huadai, et al.
Published: (2023)
Contextualized Automatic Speech Recognition with Dynamic Vocabulary Prediction and Activation
by: Lin, Zhennan, et al.
Published: (2025)
by: Lin, Zhennan, et al.
Published: (2025)
Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
by: Bai, Ye, et al.
Published: (2024)
by: Bai, Ye, et al.
Published: (2024)
Noisy Disentanglement with Tri-stage Training for Noise-Robust Speech Recognition
by: Chen, Shuangyuan, et al.
Published: (2025)
by: Chen, Shuangyuan, et al.
Published: (2025)
Two-pass Endpoint Detection for Speech Recognition
by: Raju, Anirudh, et al.
Published: (2024)
by: Raju, Anirudh, et al.
Published: (2024)
Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition
by: Wang, Huimeng, et al.
Published: (2025)
by: Wang, Huimeng, et al.
Published: (2025)
Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis
by: Tian, Wenjie, et al.
Published: (2025)
by: Tian, Wenjie, et al.
Published: (2025)
FireRedTTS-1S: An Upgraded Streamable Foundation Text-to-Speech System
by: Guo, Hao-Han, et al.
Published: (2025)
by: Guo, Hao-Han, et al.
Published: (2025)
Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech
by: Lin, Guan-Ting, et al.
Published: (2024)
by: Lin, Guan-Ting, et al.
Published: (2024)
Similar Items
-
The TEA-ASLP System for Multilingual Conversational Speech Recognition and Speech Diarization in MLC-SLM 2025 Challenge
by: Xue, Hongfei, et al.
Published: (2025) -
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition
by: Wang, He, et al.
Published: (2024) -
CNVSRC 2024: The Second Chinese Continuous Visual Speech Recognition Challenge
by: Liu, Zehua, et al.
Published: (2025) -
Mixture of LoRA Experts with Multi-Modal and Multi-Granularity LLM Generative Error Correction for Accented Speech Recognition
by: Mu, Bingshen, et al.
Published: (2025) -
Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition
by: Xie, Xurong, et al.
Published: (2022)