Saved in:
| Main Authors: | Li, Zongyi, Hu, Shujie, Liu, Shujie, Zhou, Long, Choi, Jeongsoo, Meng, Lingwei, Guo, Xun, Li, Jinyu, Ling, Hefei, Wei, Furu |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.20502 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Autoregressive Speech Synthesis without Vector Quantization
by: Meng, Lingwei, et al.
Published: (2024)
by: Meng, Lingwei, et al.
Published: (2024)
Boosting Large Language Model for Speech Synthesis: An Empirical Study
by: Hao, Hongkun, et al.
Published: (2023)
by: Hao, Hongkun, et al.
Published: (2023)
V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow
by: Choi, Jeongsoo, et al.
Published: (2024)
by: Choi, Jeongsoo, et al.
Published: (2024)
WavLLM: Towards Robust and Adaptive Speech Large Language Model
by: Hu, Shujie, et al.
Published: (2024)
by: Hu, Shujie, et al.
Published: (2024)
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
by: Han, Bing, et al.
Published: (2024)
by: Han, Bing, et al.
Published: (2024)
StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
Advanced Long-Content Speech Recognition With Factorized Neural Transducer
by: Gong, Xun, et al.
Published: (2024)
by: Gong, Xun, et al.
Published: (2024)
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
by: Chen, Sanyuan, et al.
Published: (2024)
by: Chen, Sanyuan, et al.
Published: (2024)
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
by: Yang, Yifan, et al.
Published: (2025)
by: Yang, Yifan, et al.
Published: (2025)
WavMark: Watermarking for Audio Generation
by: Chen, Guangyu, et al.
Published: (2023)
by: Chen, Guangyu, et al.
Published: (2023)
Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling
by: Sun, Haiyang, et al.
Published: (2025)
by: Sun, Haiyang, et al.
Published: (2025)
EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning
by: Wang, Dingdong, et al.
Published: (2026)
by: Wang, Dingdong, et al.
Published: (2026)
A proof for the conjecture on superlinear problems with Ambrosetti-Rabinowitz condition
by: Li, Chong, et al.
Published: (2026)
by: Li, Chong, et al.
Published: (2026)
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
by: Meng, Lingwei, et al.
Published: (2024)
by: Meng, Lingwei, et al.
Published: (2024)
Risk Factors of Thrombocytopenia After Cardiac Surgery with Cardiopulmonary Bypass
by: Shujie Yan
Published: (2023)
by: Shujie Yan
Published: (2023)
AlignFormer: Modality Matching Can Achieve Better Zero-shot Instruction-Following Speech-LLM
by: Fan, Ruchao, et al.
Published: (2024)
by: Fan, Ruchao, et al.
Published: (2024)
Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
by: Ren, Shuhuai, et al.
Published: (2025)
by: Ren, Shuhuai, et al.
Published: (2025)
DBDH: A Dual-Branch Dual-Head Neural Network for Invisible Embedded Regions Localization
by: Zhao, Chengxin, et al.
Published: (2024)
by: Zhao, Chengxin, et al.
Published: (2024)
Representation Alignment Contrastive Regularization for Multi-Object Tracking
by: Liu, Zhonglin, et al.
Published: (2024)
by: Liu, Zhonglin, et al.
Published: (2024)
Long-range hopping in a quasiperiodic potential weakens the non-Hermitian skin effect
by: Peng, Dechi, et al.
Published: (2024)
by: Peng, Dechi, et al.
Published: (2024)
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
by: Huang, Xun, et al.
Published: (2025)
by: Huang, Xun, et al.
Published: (2025)
Interleaved Speech-Text Language Models for Simple Streaming Text-to-Speech Synthesis
by: Yang, Yifan, et al.
Published: (2024)
by: Yang, Yifan, et al.
Published: (2024)
Towards One-bit ASR: Extremely Low-bit Conformer Quantization Using Co-training and Stochastic Precision
by: Li, Zhaoqing, et al.
Published: (2025)
by: Li, Zhaoqing, et al.
Published: (2025)
Privacy-Preserving Community Detection for Locally Distributed Multiple Networks
by: Guo, Xiao, et al.
Published: (2023)
by: Guo, Xiao, et al.
Published: (2023)
Where Do Flow Semantics Reside? A Protocol-Native Tabular Pretraining Paradigm for Encrypted Traffic Classification
by: Huang, Sizhe, et al.
Published: (2026)
by: Huang, Sizhe, et al.
Published: (2026)
Generalization and Risk Bounds for Recurrent Neural Networks
by: Cheng, Xuewei, et al.
Published: (2024)
by: Cheng, Xuewei, et al.
Published: (2024)
A joint modeling approach to treatment effects estimation with unmeasured confounders
by: Lee, Namhwa, et al.
Published: (2024)
by: Lee, Namhwa, et al.
Published: (2024)
Transfer Learning for High Dimensional Robust Regression
by: Yuan, Xiaohui, et al.
Published: (2024)
by: Yuan, Xiaohui, et al.
Published: (2024)
Learning to Identify Conflicts in RPKI
by: Schulmann, Haya, et al.
Published: (2025)
by: Schulmann, Haya, et al.
Published: (2025)
TASER: Task-Aware Spectral Energy Refine for Backdoor Suppression in UAV Swarms Decentralized Federated Learning
by: Huang, Sizhe, et al.
Published: (2026)
by: Huang, Sizhe, et al.
Published: (2026)
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
by: Choi, Jeongsoo, et al.
Published: (2025)
by: Choi, Jeongsoo, et al.
Published: (2025)
LongCat-AudioDiT: High-Fidelity Diffusion Text-to-Speech in the Waveform Latent Space
by: Xin, Detai, et al.
Published: (2026)
by: Xin, Detai, et al.
Published: (2026)
Thermodynamic modes of a quasiperiodic mobility-edge system in a quantum Otto cycle
by: Zhou, Ao, et al.
Published: (2026)
by: Zhou, Ao, et al.
Published: (2026)
DEER: Draft with Diffusion, Verify with Autoregressive Models
by: Cheng, Zicong, et al.
Published: (2025)
by: Cheng, Zicong, et al.
Published: (2025)
LaVieID: Local Autoregressive Diffusion Transformers for Identity-Preserving Video Creation
by: Song, Wenhui, et al.
Published: (2025)
by: Song, Wenhui, et al.
Published: (2025)
Wigner distribution, Wigner entropy, and Anomalous Transport of a Generalized Aubry-André model
by: Lu, Feng, et al.
Published: (2025)
by: Lu, Feng, et al.
Published: (2025)
Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis
by: Niu, Zhikang, et al.
Published: (2025)
by: Niu, Zhikang, et al.
Published: (2025)
DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation
by: Ye, Bo, et al.
Published: (2026)
by: Ye, Bo, et al.
Published: (2026)
A$^2$RD: Agentic Autoregressive Diffusion for Long Video Consistency
by: Long, Do Xuan, et al.
Published: (2026)
by: Long, Do Xuan, et al.
Published: (2026)
Similar Items
-
Autoregressive Speech Synthesis without Vector Quantization
by: Meng, Lingwei, et al.
Published: (2024) -
Boosting Large Language Model for Speech Synthesis: An Empirical Study
by: Hao, Hongkun, et al.
Published: (2023) -
V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow
by: Choi, Jeongsoo, et al.
Published: (2024) -
WavLLM: Towards Robust and Adaptive Speech Large Language Model
by: Hu, Shujie, et al.
Published: (2024) -
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
by: Han, Bing, et al.
Published: (2024)