Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Askari, Arian, Petcu, Roxana, Meng, Chuan, Aliannejadi, Mohammad, Abolghasemi, Amin, Kanoulas, Evangelos, Verberne, Suzan
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2402.11633
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917915693940736
author	Askari, Arian Petcu, Roxana Meng, Chuan Aliannejadi, Mohammad Abolghasemi, Amin Kanoulas, Evangelos Verberne, Suzan
author_facet	Askari, Arian Petcu, Roxana Meng, Chuan Aliannejadi, Mohammad Abolghasemi, Amin Kanoulas, Evangelos Verberne, Suzan
contents	Identifying user intents in information-seeking dialogs is crucial for a system to meet user's information needs. Intent prediction (IP) is challenging and demands sufficient dialogs with human-labeled intents for training. However, manually annotating intents is resource-intensive. While large language models (LLMs) have been shown to be effective in generating synthetic data, there is no study on using LLMs to generate intent-aware information-seeking dialogs. In this paper, we focus on leveraging LLMs for zero-shot generation of large-scale, open-domain, and intent-aware information-seeking dialogs. We propose SOLID, which has novel self-seeding and multi-intent self-instructing schemes. The former improves the generation quality by using the LLM's own knowledge scope to initiate dialog generation; the latter prompts the LLM to generate utterances sequentially, and mitigates the need for manual prompt design by asking the LLM to autonomously adapt its prompt instruction when generating complex multi-intent utterances. Furthermore, we propose SOLID-RL, which is further trained to generate a dialog in one step on the data generated by SOLID. We propose a length-based quality estimation mechanism to assign varying weights to SOLID-generated dialogs based on their quality during the training process of SOLID-RL. We use SOLID and SOLID-RL to generate more than 300k intent-aware dialogs, surpassing the size of existing datasets. Experiments show that IP methods trained on dialogs generated by SOLID and SOLID-RL achieve better IP quality than ones trained on human-generated dialogs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_11633
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Self-seeding and Multi-intent Self-instructing LLMs for Generating Intent-aware Information-Seeking dialogs Askari, Arian Petcu, Roxana Meng, Chuan Aliannejadi, Mohammad Abolghasemi, Amin Kanoulas, Evangelos Verberne, Suzan Computation and Language Identifying user intents in information-seeking dialogs is crucial for a system to meet user's information needs. Intent prediction (IP) is challenging and demands sufficient dialogs with human-labeled intents for training. However, manually annotating intents is resource-intensive. While large language models (LLMs) have been shown to be effective in generating synthetic data, there is no study on using LLMs to generate intent-aware information-seeking dialogs. In this paper, we focus on leveraging LLMs for zero-shot generation of large-scale, open-domain, and intent-aware information-seeking dialogs. We propose SOLID, which has novel self-seeding and multi-intent self-instructing schemes. The former improves the generation quality by using the LLM's own knowledge scope to initiate dialog generation; the latter prompts the LLM to generate utterances sequentially, and mitigates the need for manual prompt design by asking the LLM to autonomously adapt its prompt instruction when generating complex multi-intent utterances. Furthermore, we propose SOLID-RL, which is further trained to generate a dialog in one step on the data generated by SOLID. We propose a length-based quality estimation mechanism to assign varying weights to SOLID-generated dialogs based on their quality during the training process of SOLID-RL. We use SOLID and SOLID-RL to generate more than 300k intent-aware dialogs, surpassing the size of existing datasets. Experiments show that IP methods trained on dialogs generated by SOLID and SOLID-RL achieve better IP quality than ones trained on human-generated dialogs.
title	Self-seeding and Multi-intent Self-instructing LLMs for Generating Intent-aware Information-Seeking dialogs
topic	Computation and Language
url	https://arxiv.org/abs/2402.11633

Similar Items