Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lin, Xiaoyu, Yu, Xinkai, Aich, Ankit, Giorgi, Salvatore, Ungar, Lyle
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2409.00262
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916377060704256
author	Lin, Xiaoyu Yu, Xinkai Aich, Ankit Giorgi, Salvatore Ungar, Lyle
author_facet	Lin, Xiaoyu Yu, Xinkai Aich, Ankit Giorgi, Salvatore Ungar, Lyle
contents	Large Language Models (LLMs), which simulate human users, are frequently employed to evaluate chatbots in applications such as tutoring and customer service. Effective evaluation necessitates a high degree of human-like diversity within these simulations. In this paper, we demonstrate that conversations generated by GPT-4o mini, when used as simulated human participants, systematically differ from those between actual humans across multiple linguistic features. These features include topic variation, lexical attributes, and both the average behavior and diversity (variance) of the language used. To address these discrepancies, we propose an approach that automatically generates prompts for user simulations by incorporating features derived from real human interactions, such as age, gender, emotional tone, and the topics discussed. We assess our approach using differential language analysis combined with deep linguistic inquiry. Our method of prompt optimization, tailored to target specific linguistic features, shows significant improvements. Specifically, it enhances the human-likeness of LLM chatbot conversations, increasing their linguistic diversity. On average, we observe a 54 percent reduction in the error of average features between human and LLM-generated conversations. This method of constructing chatbot sets with human-like diversity holds great potential for enhancing the evaluation process of user-facing bots.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_00262
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	DiverseDialogue: A Methodology for Designing Chatbots with Human-Like Diversity Lin, Xiaoyu Yu, Xinkai Aich, Ankit Giorgi, Salvatore Ungar, Lyle Computation and Language Large Language Models (LLMs), which simulate human users, are frequently employed to evaluate chatbots in applications such as tutoring and customer service. Effective evaluation necessitates a high degree of human-like diversity within these simulations. In this paper, we demonstrate that conversations generated by GPT-4o mini, when used as simulated human participants, systematically differ from those between actual humans across multiple linguistic features. These features include topic variation, lexical attributes, and both the average behavior and diversity (variance) of the language used. To address these discrepancies, we propose an approach that automatically generates prompts for user simulations by incorporating features derived from real human interactions, such as age, gender, emotional tone, and the topics discussed. We assess our approach using differential language analysis combined with deep linguistic inquiry. Our method of prompt optimization, tailored to target specific linguistic features, shows significant improvements. Specifically, it enhances the human-likeness of LLM chatbot conversations, increasing their linguistic diversity. On average, we observe a 54 percent reduction in the error of average features between human and LLM-generated conversations. This method of constructing chatbot sets with human-like diversity holds great potential for enhancing the evaluation process of user-facing bots.
title	DiverseDialogue: A Methodology for Designing Chatbots with Human-Like Diversity
topic	Computation and Language
url	https://arxiv.org/abs/2409.00262

Similar Items