Saved in:
| Main Authors: | Zhao, Junchuan, Vu, Minh Duc, Wang, Ye |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.05373 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning
by: Zhao, Junchuan, et al.
Published: (2025)
by: Zhao, Junchuan, et al.
Published: (2025)
Disentangling Score Content and Performance Style for Joint Piano Rendering and Transcription
by: Zeng, Wei, et al.
Published: (2025)
by: Zeng, Wei, et al.
Published: (2025)
Multi-Step Prediction and Control of Hierarchical Emotion Distribution in Text-to-Speech Synthesis
by: Inoue, Sho, et al.
Published: (2025)
by: Inoue, Sho, et al.
Published: (2025)
ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription
by: Le, Khanh, et al.
Published: (2025)
by: Le, Khanh, et al.
Published: (2025)
Hierarchical Control of Emotion Rendering in Speech Synthesis
by: Inoue, Sho, et al.
Published: (2024)
by: Inoue, Sho, et al.
Published: (2024)
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
by: Chen, Peikun, et al.
Published: (2024)
by: Chen, Peikun, et al.
Published: (2024)
SegAug: CTC-Aligned Segmented Augmentation For Robust RNN-Transducer Based Speech Recognition
by: Le, Khanh, et al.
Published: (2025)
by: Le, Khanh, et al.
Published: (2025)
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
by: Jung, Jee-weon, et al.
Published: (2024)
by: Jung, Jee-weon, et al.
Published: (2024)
Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis
by: Inoue, Sho, et al.
Published: (2024)
by: Inoue, Sho, et al.
Published: (2024)
An Explainable Probabilistic Attribute Embedding Approach for Spoofed Speech Characterization
by: Chhibber, Manasi, et al.
Published: (2024)
by: Chhibber, Manasi, et al.
Published: (2024)
MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder
by: Le-Duc, Khai, et al.
Published: (2024)
by: Le-Duc, Khai, et al.
Published: (2024)
MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
by: Yang, Qian, et al.
Published: (2024)
by: Yang, Qian, et al.
Published: (2024)
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model
by: Wang, Siyang, et al.
Published: (2024)
by: Wang, Siyang, et al.
Published: (2024)
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
by: Tao, Dehua, et al.
Published: (2024)
by: Tao, Dehua, et al.
Published: (2024)
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems
by: Wu, Haibin, et al.
Published: (2024)
by: Wu, Haibin, et al.
Published: (2024)
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
by: Gong, Cheng, et al.
Published: (2023)
by: Gong, Cheng, et al.
Published: (2023)
Decoding Order Matters in Autoregressive Speech Synthesis
by: Zhao, Minghui, et al.
Published: (2026)
by: Zhao, Minghui, et al.
Published: (2026)
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
by: Jiang, Yuepeng, et al.
Published: (2024)
by: Jiang, Yuepeng, et al.
Published: (2024)
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation
by: Luong, Hieu-Thi, et al.
Published: (2024)
by: Luong, Hieu-Thi, et al.
Published: (2024)
MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
by: Müller, Nicolas M., et al.
Published: (2024)
by: Müller, Nicolas M., et al.
Published: (2024)
Exploring WavLM Back-ends for Speech Spoofing and Deepfake Detection
by: Stourbe, Theophile, et al.
Published: (2024)
by: Stourbe, Theophile, et al.
Published: (2024)
ALDAS: Audio-Linguistic Data Augmentation for Spoofed Audio Detection
by: Khanjani, Zahra, et al.
Published: (2024)
by: Khanjani, Zahra, et al.
Published: (2024)
Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits
by: Huang, Sung-Feng, et al.
Published: (2025)
by: Huang, Sung-Feng, et al.
Published: (2025)
From Continuous to Discrete: Cross-Domain Collaborative General Speech Enhancement via Hierarchical Language Models
by: Mu, Zhaoxi, et al.
Published: (2025)
by: Mu, Zhaoxi, et al.
Published: (2025)
Language-Codec: Bridging Discrete Codec Representations and Speech Language Models
by: Ji, Shengpeng, et al.
Published: (2024)
by: Ji, Shengpeng, et al.
Published: (2024)
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
by: Du, Chenpeng, et al.
Published: (2024)
by: Du, Chenpeng, et al.
Published: (2024)
Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation
by: Bai, Ye, et al.
Published: (2024)
by: Bai, Ye, et al.
Published: (2024)
S2ST-Omni: Hierarchical Language-Aware SpeechLLM Adaptation for Multilingual Speech-to-Speech Translation
by: Pan, Yu, et al.
Published: (2025)
by: Pan, Yu, et al.
Published: (2025)
Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization
by: Wan, Genshun, et al.
Published: (2026)
by: Wan, Genshun, et al.
Published: (2026)
Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection?
by: Dutta, Bikash, et al.
Published: (2025)
by: Dutta, Bikash, et al.
Published: (2025)
Two-Path GMM-ResNet and GMM-SENet for ASV Spoofing Detection
by: Lei, Zhenchun, et al.
Published: (2024)
by: Lei, Zhenchun, et al.
Published: (2024)
Absorbing Discrete Diffusion for Speech Enhancement
by: Gonzalez, Philippe
Published: (2026)
by: Gonzalez, Philippe
Published: (2026)
Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning
by: Zhu, Xinfa, et al.
Published: (2023)
by: Zhu, Xinfa, et al.
Published: (2023)
Spoof Diarization: "What Spoofed When" in Partially Spoofed Audio
by: Zhang, Lin, et al.
Published: (2024)
by: Zhang, Lin, et al.
Published: (2024)
Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context Masking
by: Le, Khanh, et al.
Published: (2025)
by: Le, Khanh, et al.
Published: (2025)
How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?
by: Liu, Tianchi, et al.
Published: (2024)
by: Liu, Tianchi, et al.
Published: (2024)
When Spoof Detectors Travel: Evaluation Across 66 Languages in the Low-Resource Language Spoofing Corpus
by: Borodin, Kirill, et al.
Published: (2026)
by: Borodin, Kirill, et al.
Published: (2026)
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
by: Pham, Lam, et al.
Published: (2024)
by: Pham, Lam, et al.
Published: (2024)
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
by: Wang, Hankun, et al.
Published: (2024)
by: Wang, Hankun, et al.
Published: (2024)
Spoofing-Robust Speaker Verification Using Parallel Embedding Fusion: BTU Speech Group's Approach for ASVspoof5 Challenge
by: Kurnaz, Oğuzhan, et al.
Published: (2024)
by: Kurnaz, Oğuzhan, et al.
Published: (2024)
Similar Items
-
Prosody-Adaptable Audio Codecs for Zero-Shot Voice Conversion via In-Context Learning
by: Zhao, Junchuan, et al.
Published: (2025) -
Disentangling Score Content and Performance Style for Joint Piano Rendering and Transcription
by: Zeng, Wei, et al.
Published: (2025) -
Multi-Step Prediction and Control of Hierarchical Emotion Distribution in Text-to-Speech Synthesis
by: Inoue, Sho, et al.
Published: (2025) -
ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription
by: Le, Khanh, et al.
Published: (2025) -
Hierarchical Control of Emotion Rendering in Speech Synthesis
by: Inoue, Sho, et al.
Published: (2024)