Saved in:
| Main Authors: | Zhang, Bingyuan, Zhang, Xulong, Cheng, Ning, Yu, Jun, Xiao, Jing, Wang, Jianzong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.08049 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis
by: Tang, Haobin, et al.
Published: (2024)
by: Tang, Haobin, et al.
Published: (2024)
RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis
by: Shi, Haoxiang, et al.
Published: (2024)
by: Shi, Haoxiang, et al.
Published: (2024)
EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization
by: Wang, Jianzong, et al.
Published: (2024)
by: Wang, Jianzong, et al.
Published: (2024)
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
by: Deng, Yimin, et al.
Published: (2024)
by: Deng, Yimin, et al.
Published: (2024)
CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition
by: Wang, Jianzong, et al.
Published: (2024)
by: Wang, Jianzong, et al.
Published: (2024)
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation
by: Wang, Jianzong, et al.
Published: (2023)
by: Wang, Jianzong, et al.
Published: (2023)
EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning
by: Liang, Ziqi, et al.
Published: (2024)
by: Liang, Ziqi, et al.
Published: (2024)
Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval
by: Deng, Yimin, et al.
Published: (2024)
by: Deng, Yimin, et al.
Published: (2024)
ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations
by: Zhang, Xulong, et al.
Published: (2024)
by: Zhang, Xulong, et al.
Published: (2024)
Attention-weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied to Speech Emotion Recognition
by: Yang, Qingran, et al.
Published: (2026)
by: Yang, Qingran, et al.
Published: (2026)
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
by: Li, Pengcheng, et al.
Published: (2024)
by: Li, Pengcheng, et al.
Published: (2024)
CycleFlow: Leveraging Cycle Consistency in Flow Matching for Speaker Style Adaptation
by: Liang, Ziqi, et al.
Published: (2025)
by: Liang, Ziqi, et al.
Published: (2025)
Improving Controllability and Editability for Pretrained Text-to-Music Generation Models
by: Zhang, Yixiao
Published: (2024)
by: Zhang, Yixiao
Published: (2024)
EmoFake: An Initial Dataset for Emotion Fake Audio Detection
by: Zhao, Yan, et al.
Published: (2022)
by: Zhao, Yan, et al.
Published: (2022)
EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition
by: Li, Pengcheng, et al.
Published: (2025)
by: Li, Pengcheng, et al.
Published: (2025)
EmoQ: Speech Emotion Recognition via Speech-Aware Q-Former and Large Language Model
by: Yang, Yiqing, et al.
Published: (2025)
by: Yang, Yiqing, et al.
Published: (2025)
SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
by: Fan, Zhiyun, et al.
Published: (2024)
by: Fan, Zhiyun, et al.
Published: (2024)
Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization
by: Gao, Xiaoxue, et al.
Published: (2024)
by: Gao, Xiaoxue, et al.
Published: (2024)
EmoOmni: Bridging Emotional Understanding and Expression in Omni-Modal LLMs
by: Tian, Wenjie, et al.
Published: (2026)
by: Tian, Wenjie, et al.
Published: (2026)
IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding
by: Li, Pengcheng, et al.
Published: (2024)
by: Li, Pengcheng, et al.
Published: (2024)
FastTalker: Jointly Generating Speech and Conversational Gestures from Text
by: Guo, Zixin, et al.
Published: (2024)
by: Guo, Zixin, et al.
Published: (2024)
Generalized Audio Deepfake Detection Using Frame-level Latent Information Entropy
by: Zhao, Botao, et al.
Published: (2025)
by: Zhao, Botao, et al.
Published: (2025)
A Unified Neural Codec Language Model for Selective Editable Text to Speech Generation
by: Pei, Hanchen, et al.
Published: (2026)
by: Pei, Hanchen, et al.
Published: (2026)
Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling
by: Ye, Zhen, et al.
Published: (2026)
by: Ye, Zhen, et al.
Published: (2026)
Semi-Supervised Self-Learning Enhanced Music Emotion Recognition
by: Sun, Yifu, et al.
Published: (2024)
by: Sun, Yifu, et al.
Published: (2024)
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
by: Cho, Deok-Hyeon, et al.
Published: (2024)
by: Cho, Deok-Hyeon, et al.
Published: (2024)
Comprehend and Talk: Text to Speech Synthesis via Dual Language Modeling
by: Cao, Junjie, et al.
Published: (2025)
by: Cao, Junjie, et al.
Published: (2025)
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
by: Cong, Gaoxiang, et al.
Published: (2024)
by: Cong, Gaoxiang, et al.
Published: (2024)
EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations
by: Bian, Weizhen, et al.
Published: (2024)
by: Bian, Weizhen, et al.
Published: (2024)
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
by: Cho, Deok-Hyeon, et al.
Published: (2024)
by: Cho, Deok-Hyeon, et al.
Published: (2024)
EmoFormer: A Text-Independent Speech Emotion Recognition using a Hybrid Transformer-CNN model
by: Hasan, Rashedul, et al.
Published: (2025)
by: Hasan, Rashedul, et al.
Published: (2025)
Scaling Multi-Talker ASR with Speaker-Agnostic Activity Streams
by: He, Xiluo, et al.
Published: (2025)
by: He, Xiluo, et al.
Published: (2025)
EmoSteer-TTS: Fine-Grained and Training-Free Emotion-Controllable Text-to-Speech via Activation Steering
by: Xie, Tianxin, et al.
Published: (2025)
by: Xie, Tianxin, et al.
Published: (2025)
Warm Chat: Diffuse Emotion-aware Interactive Talking Head Avatar with Tree-Structured Guidance
by: Yang, Haijie, et al.
Published: (2025)
by: Yang, Haijie, et al.
Published: (2025)
Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models
by: Jing, Ruihao, et al.
Published: (2025)
by: Jing, Ruihao, et al.
Published: (2025)
Chain-Talker: Chain Understanding and Rendering for Empathetic Conversational Speech Synthesis
by: Hu, Yifan, et al.
Published: (2025)
by: Hu, Yifan, et al.
Published: (2025)
EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)
by: Gudmalwar, Ashishkumar, et al.
Published: (2024)
VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs
by: Zhang, Hezhao, et al.
Published: (2026)
by: Zhang, Hezhao, et al.
Published: (2026)
EmoTech: A Multi-modal Speech Emotion Recognition Using Multi-source Low-level Information with Hybrid Recurrent Network
by: Avro, Shamin Bin Habib, et al.
Published: (2025)
by: Avro, Shamin Bin Habib, et al.
Published: (2025)
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
by: Cho, Deok-Hyeon, et al.
Published: (2025)
by: Cho, Deok-Hyeon, et al.
Published: (2025)
Similar Items
-
ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis
by: Tang, Haobin, et al.
Published: (2024) -
RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis
by: Shi, Haoxiang, et al.
Published: (2024) -
EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization
by: Wang, Jianzong, et al.
Published: (2024) -
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
by: Deng, Yimin, et al.
Published: (2024) -
CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition
by: Wang, Jianzong, et al.
Published: (2024)