:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Griol, D., Sanchis, A., Molina, J. M., Callejas, Z.
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Computation and Language Sound
Online Access:	https://arxiv.org/abs/2501.16341
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Stepback: Enhanced Disentanglement for Voice Conversion via Multi-Task Learning
by: Yang, Qian, et al.
Published: (2025)

Distribution-based Emotion Recognition in Conversation
by: Wu, Wen, et al.
Published: (2022)

Generative Expressive Conversational Speech Synthesis
by: Liu, Rui, et al.
Published: (2024)

Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification
by: Abdullah, Badr M., et al.
Published: (2025)

Proactive Hearing Assistants that Isolate Egocentric Conversations
by: Hu, Guilin, et al.
Published: (2025)

Speaker-Aware Simulation Improves Conversational Speech Recognition
by: Gedeon, Máté, et al.
Published: (2026)

AdaptVC: High Quality Voice Conversion with Adaptive Learning
by: Kim, Jaehun, et al.
Published: (2025)

Streaming Non-Autoregressive Model for Accent Conversion and Pronunciation Improvement
by: Nguyen, Tuan-Nam, et al.
Published: (2025)

ASR Benchmarking: Need for a More Representative Conversational Dataset
by: Maheshwari, Gaurav, et al.
Published: (2024)

Emotion-Anchored Contrastive Learning Framework for Emotion Recognition in Conversation
by: Yu, Fangxu, et al.
Published: (2024)

Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech
by: Kolani, Yakov, et al.
Published: (2025)

LibriConvo: Simulating Conversations from Read Literature for ASR and Diarization
by: Gedeon, Máté, et al.
Published: (2025)

Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in German Speech Recognition
by: Pritzen, Julia, et al.
Published: (2021)

Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation
by: Kim, Heeseung, et al.
Published: (2024)

Intra- and Inter-modal Context Interaction Modeling for Conversational Speech Synthesis
by: Jia, Zhenqi, et al.
Published: (2024)

GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations
by: Li, Yupei, et al.
Published: (2025)

Voice Conversion for Lombard Speaking Style with Implicit and Explicit Acoustic Feature Conditioning
by: Woszczyk, Dominika, et al.
Published: (2025)

Conan: A Chunkwise Online Network for Zero-Shot Adaptive Voice Conversion
by: Zhang, Yu, et al.
Published: (2025)

JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis
by: Cha, Jun-Hyeok, et al.
Published: (2025)

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
by: Wei, Kun, et al.
Published: (2023)

Advancing Speech Translation: A Corpus of Mandarin-English Conversational Telephone Speech
by: Wotherspoon, Shannon, et al.
Published: (2024)

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
by: Cornell, Samuele, et al.
Published: (2024)

SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
by: Chen, Yang, et al.
Published: (2025)

On the Cost and Benefits of Training Context with Utterance or Full Conversation Training: A Comparative Stud
by: Liu, Hyouin, et al.
Published: (2025)

Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling
by: Liu, Rui, et al.
Published: (2024)

Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning
by: He, Haorui, et al.
Published: (2024)

SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
by: Li, Junjie, et al.
Published: (2023)

Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages
by: Li, Chin-Jou, et al.
Published: (2025)

GSDNet: Revisiting Incomplete Multimodal-Diffusion from Graph Spectrum Perspective for Conversation Emotion Recognition
by: Shou, Yuntao, et al.
Published: (2025)

Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion
by: Kamble, Anand, et al.
Published: (2023)

ChipChat: Low-Latency Cascaded Conversational Agent in MLX
by: Likhomanenko, Tatiana, et al.
Published: (2025)

Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India
by: Bhogale, Kaushal, et al.
Published: (2026)

Recent Trends in Distant Conversational Speech Recognition: A Review of CHiME-7 and 8 DASR Challenges
by: Cornell, Samuele, et al.
Published: (2025)

HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
by: Yoon, Sunjae, et al.
Published: (2023)

Samsung Research China-Beijing at SemEval-2024 Task 3: A multi-stage framework for Emotion-Cause Pair Extraction in Conversations
by: Zhang, Shen, et al.
Published: (2024)

A Pilot Study of Applying Sequence-to-Sequence Voice Conversion to Evaluate the Intelligibility of L2 Speech Using a Native Speaker's Shadowings
by: Geng, Haopeng, et al.
Published: (2024)

VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech
by: Lin, Yi-Cheng, et al.
Published: (2026)

Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent for Personalized and Supervised Therapy
by: Sheikh, Shakeel, et al.
Published: (2026)

Data-Centric Improvements for Enhancing Multi-Modal Understanding in Spoken Conversation Modeling
by: Chen, Maximillian, et al.
Published: (2024)

Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
by: Zhou, Kun, et al.
Published: (2024)