Saved in:
| Main Authors: | Li, Haoyang, Liu, Changsong, Rao, Wei, Shi, Hao, Sakti, Sakriani, Chng, Eng Siong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.20967 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Learning Marmoset Vocal Patterns with a Masked Autoencoder for Robust Call Segmentation, Classification, and Caller Identification
by: Wu, Bin, et al.
Published: (2024)
by: Wu, Bin, et al.
Published: (2024)
SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition
by: Hirano, Yuta, et al.
Published: (2025)
by: Hirano, Yuta, et al.
Published: (2025)
Speech Enhancement Using Continuous Embeddings of Neural Audio Codec
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems
by: Kwok, Chin Yuen, et al.
Published: (2024)
by: Kwok, Chin Yuen, et al.
Published: (2024)
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs
by: Yuhang, Yang, et al.
Published: (2024)
by: Yuhang, Yang, et al.
Published: (2024)
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
by: Hu, Yuchen, et al.
Published: (2023)
by: Hu, Yuchen, et al.
Published: (2023)
Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech
by: Li, Yuxin, et al.
Published: (2025)
by: Li, Yuxin, et al.
Published: (2025)
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation
by: Luong, Hieu-Thi, et al.
Published: (2024)
by: Luong, Hieu-Thi, et al.
Published: (2024)
EASY: Emotion-aware Speaker Anonymization via Factorized Distillation
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
Analysis of Speaker Verification Performance Trade-offs with Neural Audio Codec Transmission
by: Thakur, Nirmalya Mallick, et al.
Published: (2025)
by: Thakur, Nirmalya Mallick, et al.
Published: (2025)
Indonesian-English Code-Switching Speech Synthesizer Utilizing Multilingual STEN-TTS and Bert LID
by: Handoyo, Ahmad Alfani, et al.
Published: (2024)
by: Handoyo, Ahmad Alfani, et al.
Published: (2024)
MULTI-Bench: A Multi-Turn Interactive Benchmark for Assessing Emotional Intelligence ability of Spoken Dialogue Models
by: Deng, Yayue, et al.
Published: (2025)
by: Deng, Yayue, et al.
Published: (2025)
On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition
by: Rossenbach, Nick, et al.
Published: (2024)
by: Rossenbach, Nick, et al.
Published: (2024)
UniArray: Unified Spectral-Spatial Modeling for Array-Geometry-Agnostic Speech Separation
by: Chen, Weiguang, et al.
Published: (2025)
by: Chen, Weiguang, et al.
Published: (2025)
ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge
by: Wang, He, et al.
Published: (2024)
by: Wang, He, et al.
Published: (2024)
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
Room Impulse Responses help attackers to evade Deep Fake Detection
by: Luong, Hieu-Thi, et al.
Published: (2024)
by: Luong, Hieu-Thi, et al.
Published: (2024)
Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection
by: Truong, Duc-Tuan, et al.
Published: (2024)
by: Truong, Duc-Tuan, et al.
Published: (2024)
Noise-Aware Speech Separation with Contrastive Learning
by: Zhang, Zizheng, et al.
Published: (2023)
by: Zhang, Zizheng, et al.
Published: (2023)
Robust Localization of Partially Fake Speech: Metrics and Out-of-Domain Evaluation
by: Luong, Hieu-Thi, et al.
Published: (2025)
by: Luong, Hieu-Thi, et al.
Published: (2025)
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
by: Ng, Dianwen, et al.
Published: (2025)
by: Ng, Dianwen, et al.
Published: (2025)
Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification
by: Truong, Duc-Tuan, et al.
Published: (2023)
by: Truong, Duc-Tuan, et al.
Published: (2023)
A correlation-permutation approach for speech-music encoders model merging
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
Enhancing Indonesian Automatic Speech Recognition: Evaluating Multilingual Models with Diverse Speech Variabilities
by: Adila, Aulia, et al.
Published: (2024)
by: Adila, Aulia, et al.
Published: (2024)
Zero-shot Context Biasing with Trie-based Decoding using Synthetic Multi-Pronunciation
by: Liu, Changsong, et al.
Published: (2025)
by: Liu, Changsong, et al.
Published: (2025)
Noise-aware Speech Enhancement using Diffusion Probabilistic Model
by: Hu, Yuchen, et al.
Published: (2023)
by: Hu, Yuchen, et al.
Published: (2023)
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
Towards Audio Codec-based Speech Separation
by: Yip, Jia Qi, et al.
Published: (2024)
by: Yip, Jia Qi, et al.
Published: (2024)
Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning
by: He, Haorui, et al.
Published: (2024)
by: He, Haorui, et al.
Published: (2024)
Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
by: Ma, Ziyang, et al.
Published: (2025)
by: Ma, Ziyang, et al.
Published: (2025)
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
by: Chen, Chen, et al.
Published: (2024)
by: Chen, Chen, et al.
Published: (2024)
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
Distilling a speech and music encoder with task arithmetic
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
Overlap-Adaptive Hybrid Speaker Diarization and ASR-Aware Observation Addition for MISP 2025 Challenge
by: Huang, Shangkun, et al.
Published: (2025)
by: Huang, Shangkun, et al.
Published: (2025)
Dataset-Distillation Generative Model for Speech Emotion Recognition
by: Ritter-Gutierrez, Fabian, et al.
Published: (2024)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2024)
Summary on The Multilingual Conversational Speech Language Model Challenge: Datasets, Tasks, Baselines, and Methods
by: Mu, Bingshen, et al.
Published: (2025)
by: Mu, Bingshen, et al.
Published: (2025)
SPGM: Prioritizing Local Features for enhanced speech separation performance
by: Yip, Jia Qi, et al.
Published: (2023)
by: Yip, Jia Qi, et al.
Published: (2023)
Bi-directional Context-Enhanced Speech Large Language Models for Multilingual Conversational ASR
by: Peng, Yizhou, et al.
Published: (2025)
by: Peng, Yizhou, et al.
Published: (2025)
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
VIBEVOICE-ASR Technical Report
by: Peng, Zhiliang, et al.
Published: (2026)
by: Peng, Zhiliang, et al.
Published: (2026)
Similar Items
-
Learning Marmoset Vocal Patterns with a Masked Autoencoder for Robust Call Segmentation, Classification, and Caller Identification
by: Wu, Bin, et al.
Published: (2024) -
SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition
by: Hirano, Yuta, et al.
Published: (2025) -
Speech Enhancement Using Continuous Embeddings of Neural Audio Codec
by: Li, Haoyang, et al.
Published: (2025) -
Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems
by: Kwok, Chin Yuen, et al.
Published: (2024) -
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs
by: Yuhang, Yang, et al.
Published: (2024)