Saved in:
Bibliographic Details
Main Authors: Kamei, Ryohei, Shiono, Daiki, Akama, Reina, Suzuki, Jun
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.09702
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910487153737728
author Kamei, Ryohei
Shiono, Daiki
Akama, Reina
Suzuki, Jun
author_facet Kamei, Ryohei
Shiono, Daiki
Akama, Reina
Suzuki, Jun
contents With the remarkable development of large language models (LLMs), ensuring the factuality of output has become a challenge. However, having all the contents of the response with given knowledge or facts is not necessarily a good thing in dialogues. This study aimed to achieve both attractiveness and factuality in a dialogue response for which a task was set to predict sentences that do not require factual correctness judgment such as agreeing, or personal opinions/feelings. We created a dataset, dialogue dataset annotated with fact-check-needed label (DDFC), for this task via crowdsourcing, and classification tasks were performed on several models using this dataset. The model with the highest classification accuracy could yield about 88% accurate classification results.
format Preprint
id arxiv_https___arxiv_org_abs_2406_09702
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Detecting Response Generation Not Requiring Factual Judgment
Kamei, Ryohei
Shiono, Daiki
Akama, Reina
Suzuki, Jun
Computation and Language
With the remarkable development of large language models (LLMs), ensuring the factuality of output has become a challenge. However, having all the contents of the response with given knowledge or facts is not necessarily a good thing in dialogues. This study aimed to achieve both attractiveness and factuality in a dialogue response for which a task was set to predict sentences that do not require factual correctness judgment such as agreeing, or personal opinions/feelings. We created a dataset, dialogue dataset annotated with fact-check-needed label (DDFC), for this task via crowdsourcing, and classification tasks were performed on several models using this dataset. The model with the highest classification accuracy could yield about 88% accurate classification results.
title Detecting Response Generation Not Requiring Factual Judgment
topic Computation and Language
url https://arxiv.org/abs/2406.09702