Saved in:
Bibliographic Details
Main Authors: Chavda, Anshul, Jagadeesh, M, Kullayappa, Chintalapalli Raja, Jayaprakash, B, Sruthi, Medchalimi, Bhattacharyya, Pushpak
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.19867
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916865033371648
author Chavda, Anshul
Jagadeesh, M
Kullayappa, Chintalapalli Raja
Jayaprakash, B
Sruthi, Medchalimi
Bhattacharyya, Pushpak
author_facet Chavda, Anshul
Jagadeesh, M
Kullayappa, Chintalapalli Raja
Jayaprakash, B
Sruthi, Medchalimi
Bhattacharyya, Pushpak
contents In-car conversational AI is becoming increasingly critical as autonomous vehicles and smart assistants gain widespread adoption. Yet, existing datasets fail to capture the spontaneous disfluencies such as hesitations, false starts, repetitions, and self-corrections that characterize real driver-AI dialogs. To address this, we introduce DiscoDrive, a synthetic corpus of 3500 multi-turn dialogs across seven automotive domains, generated using a two-stage, prompt-driven pipeline that dynamically integrates disfluencies during synthesis. We show that DiscoDrive is effective both as a training resource, enabling DialoGPT-Medium and T5-Base to match or exceed KVRET-trained models on the MultiWOZ 2.2 and Schema-Guided Dialogue (SGD) relevant test sets (BLEU-4 improvements of 0.26 to 0.61; METEOR +2.10; ROUGE-L +3.48; BERTScore F1 improvements of 1.35 to 3.48), and as a data augmentation resource in low-resource scenarios, delivering additional gains of up to BLEU-4 +0.38, METEOR +1.95, ROUGE-L +2.87, and BERTScore F1 +4.00 when combined with 10 percent of KVRET. Human evaluations further confirm that dialogs sampled from DiscoDrive are rated higher than KVRET's human-collected dialogs in naturalness (3.8 vs 3.6) and coherence (4.1 vs 4.0), and are perceived as more context-appropriate than leading post-hoc methods (such as LARD), without compromising clarity. DiscoDrive fills a critical gap in existing resources and serves as a versatile corpus for both training and augmenting conversational AI, enabling robust handling of real-world, disfluent in-car interactions.
format Preprint
id arxiv_https___arxiv_org_abs_2507_19867
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle DRIVE: Disfluency-Rich Synthetic Dialog Data Generation Framework for Intelligent Vehicle Environments
Chavda, Anshul
Jagadeesh, M
Kullayappa, Chintalapalli Raja
Jayaprakash, B
Sruthi, Medchalimi
Bhattacharyya, Pushpak
Computation and Language
In-car conversational AI is becoming increasingly critical as autonomous vehicles and smart assistants gain widespread adoption. Yet, existing datasets fail to capture the spontaneous disfluencies such as hesitations, false starts, repetitions, and self-corrections that characterize real driver-AI dialogs. To address this, we introduce DiscoDrive, a synthetic corpus of 3500 multi-turn dialogs across seven automotive domains, generated using a two-stage, prompt-driven pipeline that dynamically integrates disfluencies during synthesis. We show that DiscoDrive is effective both as a training resource, enabling DialoGPT-Medium and T5-Base to match or exceed KVRET-trained models on the MultiWOZ 2.2 and Schema-Guided Dialogue (SGD) relevant test sets (BLEU-4 improvements of 0.26 to 0.61; METEOR +2.10; ROUGE-L +3.48; BERTScore F1 improvements of 1.35 to 3.48), and as a data augmentation resource in low-resource scenarios, delivering additional gains of up to BLEU-4 +0.38, METEOR +1.95, ROUGE-L +2.87, and BERTScore F1 +4.00 when combined with 10 percent of KVRET. Human evaluations further confirm that dialogs sampled from DiscoDrive are rated higher than KVRET's human-collected dialogs in naturalness (3.8 vs 3.6) and coherence (4.1 vs 4.0), and are perceived as more context-appropriate than leading post-hoc methods (such as LARD), without compromising clarity. DiscoDrive fills a critical gap in existing resources and serves as a versatile corpus for both training and augmenting conversational AI, enabling robust handling of real-world, disfluent in-car interactions.
title DRIVE: Disfluency-Rich Synthetic Dialog Data Generation Framework for Intelligent Vehicle Environments
topic Computation and Language
url https://arxiv.org/abs/2507.19867