Saved in:
| Main Authors: | Griol, D., Sanchis, A., Molina, J. M., Callejas, Z. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.16341 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Stepback: Enhanced Disentanglement for Voice Conversion via Multi-Task Learning
by: Yang, Qian, et al.
Published: (2025)
by: Yang, Qian, et al.
Published: (2025)
Distribution-based Emotion Recognition in Conversation
by: Wu, Wen, et al.
Published: (2022)
by: Wu, Wen, et al.
Published: (2022)
Generative Expressive Conversational Speech Synthesis
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
by: Abdullah, Badr M., et al.
Published: (2025)
by: Abdullah, Badr M., et al.
Published: (2025)
Proactive Hearing Assistants that Isolate Egocentric Conversations
by: Hu, Guilin, et al.
Published: (2025)
by: Hu, Guilin, et al.
Published: (2025)
Speaker-Aware Simulation Improves Conversational Speech Recognition
by: Gedeon, Máté, et al.
Published: (2026)
by: Gedeon, Máté, et al.
Published: (2026)
AdaptVC: High Quality Voice Conversion with Adaptive Learning
by: Kim, Jaehun, et al.
Published: (2025)
by: Kim, Jaehun, et al.
Published: (2025)
Streaming Non-Autoregressive Model for Accent Conversion and Pronunciation Improvement
by: Nguyen, Tuan-Nam, et al.
Published: (2025)
by: Nguyen, Tuan-Nam, et al.
Published: (2025)
ASR Benchmarking: Need for a More Representative Conversational Dataset
by: Maheshwari, Gaurav, et al.
Published: (2024)
by: Maheshwari, Gaurav, et al.
Published: (2024)
Emotion-Anchored Contrastive Learning Framework for Emotion Recognition in Conversation
by: Yu, Fangxu, et al.
Published: (2024)
by: Yu, Fangxu, et al.
Published: (2024)
Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech
by: Kolani, Yakov, et al.
Published: (2025)
by: Kolani, Yakov, et al.
Published: (2025)
LibriConvo: Simulating Conversations from Read Literature for ASR and Diarization
by: Gedeon, Máté, et al.
Published: (2025)
by: Gedeon, Máté, et al.
Published: (2025)
Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in German Speech Recognition
by: Pritzen, Julia, et al.
Published: (2021)
by: Pritzen, Julia, et al.
Published: (2021)
Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation
by: Kim, Heeseung, et al.
Published: (2024)
by: Kim, Heeseung, et al.
Published: (2024)
Intra- and Inter-modal Context Interaction Modeling for Conversational Speech Synthesis
by: Jia, Zhenqi, et al.
Published: (2024)
by: Jia, Zhenqi, et al.
Published: (2024)
GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations
by: Li, Yupei, et al.
Published: (2025)
by: Li, Yupei, et al.
Published: (2025)
Voice Conversion for Lombard Speaking Style with Implicit and Explicit Acoustic Feature Conditioning
by: Woszczyk, Dominika, et al.
Published: (2025)
by: Woszczyk, Dominika, et al.
Published: (2025)
Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion
by: Zhang, Yu, et al.
Published: (2025)
by: Zhang, Yu, et al.
Published: (2025)
JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis
by: Cha, Jun-Hyeok, et al.
Published: (2025)
by: Cha, Jun-Hyeok, et al.
Published: (2025)
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
by: Wei, Kun, et al.
Published: (2023)
by: Wei, Kun, et al.
Published: (2023)
Advancing Speech Translation: A Corpus of Mandarin-English Conversational Telephone Speech
by: Wotherspoon, Shannon, et al.
Published: (2024)
by: Wotherspoon, Shannon, et al.
Published: (2024)
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024)
by: Cornell, Samuele, et al.
Published: (2024)
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
by: Chen, Yang, et al.
Published: (2025)
by: Chen, Yang, et al.
Published: (2025)
On the Cost and Benefits of Training Context with Utterance or Full Conversation Training: A Comparative Stud
by: Liu, Hyouin, et al.
Published: (2025)
by: Liu, Hyouin, et al.
Published: (2025)
Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning
by: He, Haorui, et al.
Published: (2024)
by: He, Haorui, et al.
Published: (2024)
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
by: Li, Junjie, et al.
Published: (2023)
by: Li, Junjie, et al.
Published: (2023)
Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages
by: Li, Chin-Jou, et al.
Published: (2025)
by: Li, Chin-Jou, et al.
Published: (2025)
GSDNet: Revisiting Incomplete Multimodal-Diffusion from Graph Spectrum Perspective for Conversation Emotion Recognition
by: Shou, Yuntao, et al.
Published: (2025)
by: Shou, Yuntao, et al.
Published: (2025)
Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion
by: Kamble, Anand, et al.
Published: (2023)
by: Kamble, Anand, et al.
Published: (2023)
ChipChat: Low-Latency Cascaded Conversational Agent in MLX
by: Likhomanenko, Tatiana, et al.
Published: (2025)
by: Likhomanenko, Tatiana, et al.
Published: (2025)
Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India
by: Bhogale, Kaushal, et al.
Published: (2026)
by: Bhogale, Kaushal, et al.
Published: (2026)
Recent Trends in Distant Conversational Speech Recognition: A Review of CHiME-7 and 8 DASR Challenges
by: Cornell, Samuele, et al.
Published: (2025)
by: Cornell, Samuele, et al.
Published: (2025)
HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
by: Yoon, Sunjae, et al.
Published: (2023)
by: Yoon, Sunjae, et al.
Published: (2023)
Samsung Research China-Beijing at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations
by: Zhang, Shen, et al.
Published: (2024)
by: Zhang, Shen, et al.
Published: (2024)
A Pilot Study of Applying Sequence-to-Sequence Voice Conversion to Evaluate the Intelligibility of L2 Speech Using a Native Speaker's Shadowings
by: Geng, Haopeng, et al.
Published: (2024)
by: Geng, Haopeng, et al.
Published: (2024)
VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech
by: Lin, Yi-Cheng, et al.
Published: (2026)
by: Lin, Yi-Cheng, et al.
Published: (2026)
Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent for Personalized and Supervised Therapy
by: Sheikh, Shakeel, et al.
Published: (2026)
by: Sheikh, Shakeel, et al.
Published: (2026)
Data-Centric Improvements for Enhancing Multi-Modal Understanding in Spoken Conversation Modeling
by: Chen, Maximillian, et al.
Published: (2024)
by: Chen, Maximillian, et al.
Published: (2024)
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
by: Zhou, Kun, et al.
Published: (2024)
by: Zhou, Kun, et al.
Published: (2024)
Similar Items
-
Stepback: Enhanced Disentanglement for Voice Conversion via Multi-Task Learning
by: Yang, Qian, et al.
Published: (2025) -
Distribution-based Emotion Recognition in Conversation
by: Wu, Wen, et al.
Published: (2022) -
Generative Expressive Conversational Speech Synthesis
by: Liu, Rui, et al.
Published: (2024) -
Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
by: Abdullah, Badr M., et al.
Published: (2025) -
Proactive Hearing Assistants that Isolate Egocentric Conversations
by: Hu, Guilin, et al.
Published: (2025)