:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liang, Ziqi, Jia, Zhijun, Liu, Chang, Yang, Minghui, Lu, Zhihong, Wang, Jian
Format:	Preprint
Published:	2026
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2602.12701
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DisCo-Speech: Controllable Zero-Shot Speech Generation with A Disentangled Speech Codec
by: Li, Tao, et al.
Published: (2025)

EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning
by: Liang, Ziqi, et al.
Published: (2024)

MaskSR: Masked Language Model for Full-band Speech Restoration
by: Li, Xu, et al.
Published: (2024)

Learning Disentangled Speech Representations
by: Brima, Yusuf, et al.
Published: (2023)

Prototype-Based Disentanglement for Controllable Dysarthric Speech Synthesis
by: Wang, Haoshen, et al.
Published: (2026)

Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
by: Deng, Yimin, et al.
Published: (2024)

Universal Discrete-Domain Speech Enhancement
by: Liu, Fei, et al.
Published: (2025)

CogSR: Semantic-Aware Speech Super-Resolution via Chain-of-Thought Guided Flow Matching
by: Yuan, Jiajun, et al.
Published: (2025)

Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation
by: Xin, Yifei, et al.
Published: (2024)

Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval
by: Deng, Yimin, et al.
Published: (2024)

EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
by: Liang, Ziqi, et al.
Published: (2024)

FLOWER: Flow-Based Estimated Gaussian Guidance for General Speech Restoration
by: Yang, Da-Hee, et al.
Published: (2025)

Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction
by: Mu, Zhaoxi, et al.
Published: (2023)

Geometric Analysis of Speech Representation Spaces: Topological Disentanglement and Confound Detection
by: Kashyap, Bipasha, et al.
Published: (2026)

Universal Robust Speech Adaptation for Cross-Domain Speech Recognition and Enhancement
by: Wang, Chien-Chun, et al.
Published: (2026)

DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
by: Melechovsky, Jan, et al.
Published: (2024)

Disentangling Textual and Acoustic Features of Neural Speech Representations
by: Mohebbi, Hosein, et al.
Published: (2024)

Language-Codec: Bridging Discrete Codec Representations and Speech Language Models
by: Ji, Shengpeng, et al.
Published: (2024)

Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
by: Yang, Jianing, et al.
Published: (2025)

Magnetoencephalography (MEG) Based Non-Invasive Chinese Speech Decoding
by: Jia, Zhihong, et al.
Published: (2025)

DMP-TTS: Disentangled multi-modal Prompting for Controllable Text-to-Speech with Chained Guidance
by: Yin, Kang, et al.
Published: (2025)

Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
by: Ku, Pin-Jui, et al.
Published: (2024)

Quantifying Dimensional Independence in Speech: An Information-Theoretic Framework for Disentangled Representation Learning
by: Kashyap, Bipasha, et al.
Published: (2026)

Speech Watermarking with Discrete Intermediate Representations
by: Ji, Shengpeng, et al.
Published: (2024)

AlignCap: Aligning Speech Emotion Captioning to Human Preferences
by: Liang, Ziqi, et al.
Published: (2024)

POTSA: A Cross-Lingual Speech Alignment Framework for Speech-to-Text Translation
by: Li, Xuanchen, et al.
Published: (2025)

Listen through the Sound: Generative Speech Restoration Leveraging Acoustic Context Representation
by: Chung, Soo-Whan, et al.
Published: (2025)

Semantic Codebooks as Effective Priors for Neural Speech Compression
by: Bai, Liuyang, et al.
Published: (2025)

Speaker-Disentangled Remote Speech Detection of Asthma and COPD Exacerbations
by: Yan, Yuyang, et al.
Published: (2026)

Stage-Wise and Prior-Aware Neural Speech Phase Prediction
by: Liu, Fei, et al.
Published: (2024)

DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis
by: Lu, Ye-Xin, et al.
Published: (2025)

Automatic Restoration of Diacritics for Speech Data Sets
by: Shatnawi, Sara, et al.
Published: (2023)

Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis
by: Liu, Qingyu, et al.
Published: (2025)

BridgeCode: A Dual Speech Representation Paradigm for Autoregressive Zero-Shot Text-to-Speech Synthesis
by: Xing, Jingyuan, et al.
Published: (2025)

MF-Speech: Achieving Fine-Grained and Compositional Control in Speech Generation via Factor Disentanglement
by: Yu, Xinyue, et al.
Published: (2025)

Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance
by: Ochiai, Tsubasa, et al.
Published: (2024)

Decoding Order Matters in Autoregressive Speech Synthesis
by: Zhao, Minghui, et al.
Published: (2026)

SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms
by: Li, Sirui, et al.
Published: (2025)

Koopman Regularized Deep Speech Disentanglement for Speaker Verification
by: Chazaridis, Nikos, et al.
Published: (2026)

SECodec: Structural Entropy-based Compressive Speech Representation Codec for Speech Language Models
by: Wang, Linqin, et al.
Published: (2024)