:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Phuong, Tuan Dat, Truong, Duc-Tuan, Hoang, Long-Vu, Thu, Trang Nguyen Thi
Format:	Preprint
Published:	2026
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2602.04702
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models
by: Phuong, Tuan Dat, et al.
Published: (2025)

Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection
by: Truong, Duc-Tuan, et al.
Published: (2024)

VoxVietnam: a Large-Scale Multi-Genre Dataset for Vietnamese Speaker Recognition
by: Vu, Hoang Long, et al.
Published: (2024)

Qwen vs. Gemma Integration with Whisper: A Comparative Study in Multilingual SpeechLLM Systems
by: Nguyen, Tuan, et al.
Published: (2025)

XLSR-Kanformer: A KAN-Intergrated model for Synthetic Speech Detection
by: Dat, Phuong Tuan, et al.
Published: (2025)

Acoustic scattering AI for non-invasive object classifications: A case study on hair assessment
by: Hoang, Long-Vu, et al.
Published: (2025)

QAMO: Quality-aware Multi-centroid One-class Learning For Speech Deepfake Detection
by: Truong, Duc-Tuan, et al.
Published: (2025)

Addressing Gradient Misalignment in Data-Augmented Training for Robust Speech Deepfake Detection
by: Truong, Duc-Tuan, et al.
Published: (2025)

Continuous Learning of Transformer-based Audio Deepfake Detection
by: Le, Tuan Duy Nguyen, et al.
Published: (2024)

AsyncSwitch: Asynchronous Text-Speech Adaptation for Code-Switched ASR
by: Nguyen, Tuan, et al.
Published: (2025)

AdaCS: Adaptive Normalization for Enhanced Code-Switching ASR
by: Chu, The Chuong, et al.
Published: (2025)

MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generative Model
by: Pham, The Hieu, et al.
Published: (2025)

Zero-Shot Text-to-Speech for Vietnamese
by: Vu, Thi, et al.
Published: (2025)

Room Impulse Responses help attackers to evade Deep Fake Detection
by: Luong, Hieu-Thi, et al.
Published: (2024)

A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
by: Pham, Lam, et al.
Published: (2024)

Environmental Sound Deepfake Detection Using Deep-Learning Framework
by: Pham, Lam, et al.
Published: (2026)

Mispronunciation Detection and Diagnosis Without Model Training: A Retrieval-Based Approach
by: Tu, Huu Tuong, et al.
Published: (2025)

ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription
by: Le, Khanh, et al.
Published: (2025)

Can we train ASR systems on Code-switch without real code-switch data? Case study for Singapore's languages
by: Nguyen, Tuan, et al.
Published: (2025)

SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
by: Le, Khanh, et al.
Published: (2025)

A General Model for Deepfake Speech Detection: Diverse Bonafide Resources or Diverse AI-Based Generators
by: Pham, Lam, et al.
Published: (2026)

Nes2Net: A Lightweight Nested Architecture for Foundation Model Driven Speech Anti-spoofing
by: Liu, Tianchi, et al.
Published: (2025)

MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation
by: Le-Duc, Khai, et al.
Published: (2025)

Toward Fine-Grained Speech Inpainting Forensics:A Dataset, Method, and Metric for Multi-Region Tampering Localization
by: Vu, Tung, et al.
Published: (2026)

Hierarchical Decoding for Discrete Speech Synthesis with Multi-Resolution Spoof Detection
by: Zhao, Junchuan, et al.
Published: (2026)

Attention-based Mixture of Experts for Robust Speech Deepfake Detection
by: Negroni, Viola, et al.
Published: (2025)

O_O-VC: Synthetic Data-Driven One-to-One Alignment for Any-to-Any Voice Conversion
by: Tu, Huu Tuong, et al.
Published: (2025)

Stream-based Active Learning for Anomalous Sound Detection in Machine Condition Monitoring
by: Ho, Tuan Vu, et al.
Published: (2024)

Assessing the Impact of Speaker Identity in Speech Spoofing Detection
by: Dao, Anh-Tuan, et al.
Published: (2026)

Speechless: Speech Instruction Training Without Speech for Low Resource Languages
by: Dao, Alan, et al.
Published: (2025)

Deepfake Audio Detection Using Spectrogram-based Feature and Ensemble of Deep Learning Models
by: Pham, Lam, et al.
Published: (2024)

Frame-level Temporal Difference Learning for Partial Deepfake Speech Detection
by: Li, Menglu, et al.
Published: (2025)

Multi-Task Transformer for Explainable Speech Deepfake Detection via Formant Modeling
by: Negroni, Viola, et al.
Published: (2026)

Xi+: Uncertainty Supervision for Robust Speaker Embedding
by: Li, Junjie, et al.
Published: (2025)

Real-time Speech Summarization for Medical Conversations
by: Le-Duc, Khai, et al.
Published: (2024)

Fake Speech Wild: Detecting Deepfake Speech on Social Media Platform
by: Xie, Yuankun, et al.
Published: (2025)

MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
by: Le-Duc, Khai, et al.
Published: (2024)

Towards Scalable AASIST: Refining Graph Attention for Speech Deepfake Detection
by: Viakhirev, Ivan, et al.
Published: (2025)

Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification
by: Truong, Duc-Tuan, et al.
Published: (2023)

SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection
by: Zhu, Yi, et al.
Published: (2024)