Saved in:
| Main Authors: | Liang, Ziqi, Jia, Zhijun, Liu, Chang, Yang, Minghui, Lu, Zhihong, Wang, Jian |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.12701 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DisCo-Speech: Controllable Zero-Shot Speech Generation with A Disentangled Speech Codec
by: Li, Tao, et al.
Published: (2025)
by: Li, Tao, et al.
Published: (2025)
EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning
by: Liang, Ziqi, et al.
Published: (2024)
by: Liang, Ziqi, et al.
Published: (2024)
MaskSR: Masked Language Model for Full-band Speech Restoration
by: Li, Xu, et al.
Published: (2024)
by: Li, Xu, et al.
Published: (2024)
Learning Disentangled Speech Representations
by: Brima, Yusuf, et al.
Published: (2023)
by: Brima, Yusuf, et al.
Published: (2023)
Prototype-Based Disentanglement for Controllable Dysarthric Speech Synthesis
by: Wang, Haoshen, et al.
Published: (2026)
by: Wang, Haoshen, et al.
Published: (2026)
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
by: Deng, Yimin, et al.
Published: (2024)
by: Deng, Yimin, et al.
Published: (2024)
Universal Discrete-Domain Speech Enhancement
by: Liu, Fei, et al.
Published: (2025)
by: Liu, Fei, et al.
Published: (2025)
CogSR: Semantic-Aware Speech Super-Resolution via Chain-of-Thought Guided Flow Matching
by: Yuan, Jiajun, et al.
Published: (2025)
by: Yuan, Jiajun, et al.
Published: (2025)
Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation
by: Xin, Yifei, et al.
Published: (2024)
by: Xin, Yifei, et al.
Published: (2024)
Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval
by: Deng, Yimin, et al.
Published: (2024)
by: Deng, Yimin, et al.
Published: (2024)
EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
by: Liang, Ziqi, et al.
Published: (2024)
by: Liang, Ziqi, et al.
Published: (2024)
FLOWER: Flow-Based Estimated Gaussian Guidance for General Speech Restoration
by: Yang, Da-Hee, et al.
Published: (2025)
by: Yang, Da-Hee, et al.
Published: (2025)
Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction
by: Mu, Zhaoxi, et al.
Published: (2023)
by: Mu, Zhaoxi, et al.
Published: (2023)
Geometric Analysis of Speech Representation Spaces: Topological Disentanglement and Confound Detection
by: Kashyap, Bipasha, et al.
Published: (2026)
by: Kashyap, Bipasha, et al.
Published: (2026)
Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026)
by: Wang, Chien-Chun, et al.
Published: (2026)
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
by: Melechovsky, Jan, et al.
Published: (2024)
by: Melechovsky, Jan, et al.
Published: (2024)
Disentangling Textual and Acoustic Features of Neural Speech Representations
by: Mohebbi, Hosein, et al.
Published: (2024)
by: Mohebbi, Hosein, et al.
Published: (2024)
Language-Codec: Bridging Discrete Codec Representations and Speech Language Models
by: Ji, Shengpeng, et al.
Published: (2024)
by: Ji, Shengpeng, et al.
Published: (2024)
Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
by: Yang, Jianing, et al.
Published: (2025)
by: Yang, Jianing, et al.
Published: (2025)
Magnetoencephalography (MEG) Based Non-Invasive Chinese Speech Decoding
by: Jia, Zhihong, et al.
Published: (2025)
by: Jia, Zhihong, et al.
Published: (2025)
DMP-TTS: Disentangled multi-modal Prompting for Controllable Text-to-Speech with Chained Guidance
by: Yin, Kang, et al.
Published: (2025)
by: Yin, Kang, et al.
Published: (2025)
Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
by: Ku, Pin-Jui, et al.
Published: (2024)
by: Ku, Pin-Jui, et al.
Published: (2024)
Quantifying Dimensional Independence in Speech: An Information-Theoretic Framework for Disentangled Representation Learning
by: Kashyap, Bipasha, et al.
Published: (2026)
by: Kashyap, Bipasha, et al.
Published: (2026)
Speech Watermarking with Discrete Intermediate Representations
by: Ji, Shengpeng, et al.
Published: (2024)
by: Ji, Shengpeng, et al.
Published: (2024)
AlignCap: Aligning Speech Emotion Captioning to Human Preferences
by: Liang, Ziqi, et al.
Published: (2024)
by: Liang, Ziqi, et al.
Published: (2024)
POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
by: Li, Xuanchen, et al.
Published: (2025)
by: Li, Xuanchen, et al.
Published: (2025)
Listen through the Sound: Generative Speech Restoration Leveraging Acoustic Context Representation
by: Chung, Soo-Whan, et al.
Published: (2025)
by: Chung, Soo-Whan, et al.
Published: (2025)
Semantic Codebooks as Effective Priors for Neural Speech Compression
by: Bai, Liuyang, et al.
Published: (2025)
by: Bai, Liuyang, et al.
Published: (2025)
Speaker-Disentangled Remote Speech Detection of Asthma and COPD Exacerbations
by: Yan, Yuyang, et al.
Published: (2026)
by: Yan, Yuyang, et al.
Published: (2026)
Stage-Wise and Prior-Aware Neural Speech Phase Prediction
by: Liu, Fei, et al.
Published: (2024)
by: Liu, Fei, et al.
Published: (2024)
DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis
by: Lu, Ye-Xin, et al.
Published: (2025)
by: Lu, Ye-Xin, et al.
Published: (2025)
Automatic Restoration of Diacritics for Speech Data Sets
by: Shatnawi, Sara, et al.
Published: (2023)
by: Shatnawi, Sara, et al.
Published: (2023)
Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis
by: Liu, Qingyu, et al.
Published: (2025)
by: Liu, Qingyu, et al.
Published: (2025)
BridgeCode: A Dual Speech Representation Paradigm for Autoregressive Zero-Shot Text-to-Speech Synthesis
by: Xing, Jingyuan, et al.
Published: (2025)
by: Xing, Jingyuan, et al.
Published: (2025)
MF-Speech: Achieving Fine-Grained and Compositional Control in Speech Generation via Factor Disentanglement
by: Yu, Xinyue, et al.
Published: (2025)
by: Yu, Xinyue, et al.
Published: (2025)
Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance
by: Ochiai, Tsubasa, et al.
Published: (2024)
by: Ochiai, Tsubasa, et al.
Published: (2024)
Decoding Order Matters in Autoregressive Speech Synthesis
by: Zhao, Minghui, et al.
Published: (2026)
by: Zhao, Minghui, et al.
Published: (2026)
SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms
by: Li, Sirui, et al.
Published: (2025)
by: Li, Sirui, et al.
Published: (2025)
Koopman Regularized Deep Speech Disentanglement for Speaker Verification
by: Chazaridis, Nikos, et al.
Published: (2026)
by: Chazaridis, Nikos, et al.
Published: (2026)
SECodec: Structural Entropy-based Compressive Speech Representation Codec for Speech Language Models
by: Wang, Linqin, et al.
Published: (2024)
by: Wang, Linqin, et al.
Published: (2024)
Similar Items
-
DisCo-Speech: Controllable Zero-Shot Speech Generation with A Disentangled Speech Codec
by: Li, Tao, et al.
Published: (2025) -
EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning
by: Liang, Ziqi, et al.
Published: (2024) -
MaskSR: Masked Language Model for Full-band Speech Restoration
by: Li, Xu, et al.
Published: (2024) -
Learning Disentangled Speech Representations
by: Brima, Yusuf, et al.
Published: (2023) -
Prototype-Based Disentanglement for Controllable Dysarthric Speech Synthesis
by: Wang, Haoshen, et al.
Published: (2026)