:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Hallmen, Tobias, Deuser, Fabian, Oswald, Norbert, André, Elisabeth
Formato:	Preprint
Publicado:	2024
Materias:	Sound Artificial Intelligence Audio and Speech Processing
Acceso en línea:	https://arxiv.org/abs/2403.11879
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech
por: Cho, Deok-Hyeon, et al.
Publicado: (2024)

Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation
por: Xu, Jingyi, et al.
Publicado: (2024)

Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech
por: Chu, Yunji, et al.
Publicado: (2024)

MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention
por: Jiao, Xinxin, et al.
Publicado: (2024)

Audio-Guided Fusion Techniques for Multimodal Emotion Analysis
por: Shi, Pujin, et al.
Publicado: (2024)

Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation
por: Yu, Jun, et al.
Publicado: (2024)

Wearable Music2Emotion : Assessing Emotions Induced by AI-Generated Music through Portable EEG-fNIRS Fusion
por: Zhao, Sha, et al.
Publicado: (2025)

Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition
por: Li, Dongyuan, et al.
Publicado: (2024)

EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion
por: Gudmalwar, Ashishkumar, et al.
Publicado: (2024)

Disentangling Reasoning in Large Audio-Language Models for Ambiguous Emotion Prediction
por: Yu, Xiaofeng, et al.
Publicado: (2026)

Expressive Prompting: Improving Emotion Intensity and Speaker Consistency in Zero-Shot TTS
por: Wang, Haoyu, et al.
Publicado: (2024)

Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features
por: Dixit, Satvik, et al.
Publicado: (2024)

ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations
por: Wang, Kexue, et al.
Publicado: (2026)

Color-based Emotion Representation for Speech Emotion Recognition
por: Nagase, Ryotaro, et al.
Publicado: (2026)

Are We There Yet? A Brief Survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges
por: Kang, Jaeyong, et al.
Publicado: (2024)

MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
por: Bak, Taejun, et al.
Publicado: (2024)

MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt
por: Wu, Zhichao, et al.
Publicado: (2025)

Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention
por: Wang, Cong, et al.
Publicado: (2025)

Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
por: Yang, Jianing, et al.
Publicado: (2025)

Sync-TVA: A Graph-Attention Framework for Multimodal Emotion Recognition with Cross-Modal Fusion
por: Deng, Zeyu, et al.
Publicado: (2025)

GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning for Speech Emotion Recognition
por: Pan, Yu, et al.
Publicado: (2024)

MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition
por: Wang, He, et al.
Publicado: (2024)

EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
por: Cho, Deok-Hyeon, et al.
Publicado: (2024)

EffiFusion-GAN: Efficient Fusion Generative Adversarial Network for Speech Enhancement
por: Wen, Bin, et al.
Publicado: (2025)

Beyond Discrete Categories: Multi-Task Valence-Arousal Modeling for Pet Vocalization Analysis
por: Huang, Junyao, et al.
Publicado: (2025)

Joint Learning of Emotions in Music and Generalized Sounds
por: Simonetta, Federico, et al.
Publicado: (2024)

DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech
por: Cho, Deok-Hyeon, et al.
Publicado: (2025)

SongGLM: Lyric-to-Melody Generation with 2D Alignment Encoding and Multi-Task Pre-Training
por: Yu, Jiaxing, et al.
Publicado: (2024)

FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion
por: Chen, Shunian, et al.
Publicado: (2025)

Persian Speech Emotion Recognition by Fine-Tuning Transformers
por: Shayaninasab, Minoo, et al.
Publicado: (2024)

Leveraging Label Potential for Enhanced Multimodal Emotion Recognition
por: Shao, Xuechun, et al.
Publicado: (2025)

MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and Detection
por: Mu, Da, et al.
Publicado: (2024)

Semi-Supervised Self-Learning Enhanced Music Emotion Recognition
por: Sun, Yifu, et al.
Publicado: (2024)

Efficient Finetuning for Dimensional Speech Emotion Recognition in the Age of Transformers
por: Sampath, Aneesha, et al.
Publicado: (2025)

Disentangled Dual-Branch Graph Learning for Conversational Emotion Recognition
por: Guo, Chengling, et al.
Publicado: (2026)

Abstract Sound Fusion with Unconditional Inversion Models
por: Liu, Jing, et al.
Publicado: (2025)

Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
por: Nguyen, Tan Dat, et al.
Publicado: (2024)

Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation
por: Chang, Yi, et al.
Publicado: (2024)

Emotion-Driven Melody Harmonization via Melodic Variation and Functional Representation
por: Huang, Jingyue, et al.
Publicado: (2024)

Learning Physiology-Informed Vocal Spectrotemporal Representations for Speech Emotion Recognition
por: Zhang, Xu, et al.
Publicado: (2026)