Saved in:
Bibliographic Details
Main Authors: Lin, Xiaoyu, Yu, Xinkai, Aich, Ankit, Giorgi, Salvatore, Ungar, Lyle
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2409.00262
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916377060704256
author Lin, Xiaoyu
Yu, Xinkai
Aich, Ankit
Giorgi, Salvatore
Ungar, Lyle
author_facet Lin, Xiaoyu
Yu, Xinkai
Aich, Ankit
Giorgi, Salvatore
Ungar, Lyle
contents Large Language Models (LLMs), which simulate human users, are frequently employed to evaluate chatbots in applications such as tutoring and customer service. Effective evaluation necessitates a high degree of human-like diversity within these simulations. In this paper, we demonstrate that conversations generated by GPT-4o mini, when used as simulated human participants, systematically differ from those between actual humans across multiple linguistic features. These features include topic variation, lexical attributes, and both the average behavior and diversity (variance) of the language used. To address these discrepancies, we propose an approach that automatically generates prompts for user simulations by incorporating features derived from real human interactions, such as age, gender, emotional tone, and the topics discussed. We assess our approach using differential language analysis combined with deep linguistic inquiry. Our method of prompt optimization, tailored to target specific linguistic features, shows significant improvements. Specifically, it enhances the human-likeness of LLM chatbot conversations, increasing their linguistic diversity. On average, we observe a 54 percent reduction in the error of average features between human and LLM-generated conversations. This method of constructing chatbot sets with human-like diversity holds great potential for enhancing the evaluation process of user-facing bots.
format Preprint
id arxiv_https___arxiv_org_abs_2409_00262
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle DiverseDialogue: A Methodology for Designing Chatbots with Human-Like Diversity
Lin, Xiaoyu
Yu, Xinkai
Aich, Ankit
Giorgi, Salvatore
Ungar, Lyle
Computation and Language
Large Language Models (LLMs), which simulate human users, are frequently employed to evaluate chatbots in applications such as tutoring and customer service. Effective evaluation necessitates a high degree of human-like diversity within these simulations. In this paper, we demonstrate that conversations generated by GPT-4o mini, when used as simulated human participants, systematically differ from those between actual humans across multiple linguistic features. These features include topic variation, lexical attributes, and both the average behavior and diversity (variance) of the language used. To address these discrepancies, we propose an approach that automatically generates prompts for user simulations by incorporating features derived from real human interactions, such as age, gender, emotional tone, and the topics discussed. We assess our approach using differential language analysis combined with deep linguistic inquiry. Our method of prompt optimization, tailored to target specific linguistic features, shows significant improvements. Specifically, it enhances the human-likeness of LLM chatbot conversations, increasing their linguistic diversity. On average, we observe a 54 percent reduction in the error of average features between human and LLM-generated conversations. This method of constructing chatbot sets with human-like diversity holds great potential for enhancing the evaluation process of user-facing bots.
title DiverseDialogue: A Methodology for Designing Chatbots with Human-Like Diversity
topic Computation and Language
url https://arxiv.org/abs/2409.00262