Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Cao, Bowen, Cai, Deng, Cui, Leyang, Cheng, Xuxin, Bi, Wei, Zou, Yuexian, Shi, Shuming
Formato:	Preprint
Publicado:	2024
Materias:	Computation and Language
Acceso en línea:	https://arxiv.org/abs/2402.17532
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866909138222579712
author	Cao, Bowen Cai, Deng Cui, Leyang Cheng, Xuxin Bi, Wei Zou, Yuexian Shi, Shuming
author_facet	Cao, Bowen Cai, Deng Cui, Leyang Cheng, Xuxin Bi, Wei Zou, Yuexian Shi, Shuming
contents	Standard language models generate text by selecting tokens from a fixed, finite, and standalone vocabulary. We introduce a novel method that selects context-aware phrases from a collection of supporting documents. One of the most significant challenges for this paradigm shift is determining the training oracles, because a string of text can be segmented in various ways and each segment can be retrieved from numerous possible documents. To address this, we propose to initialize the training oracles using linguistic heuristics and, more importantly, bootstrap the oracles through iterative self-reinforcement. Extensive experiments show that our model not only outperforms standard language models on a variety of knowledge-intensive tasks but also demonstrates improved generation quality in open-ended text generation. For instance, compared to the standard language model counterpart, our model raises the accuracy from 23.47% to 36.27% on OpenbookQA, and improves the MAUVE score from 42.61% to 81.58% in open-ended text generation. Remarkably, our model also achieves the best performance and the lowest latency among several retrieval-augmented baselines. In conclusion, we assert that retrieval is more accurate generation and hope that our work will encourage further research on this new paradigm shift.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_17532
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Retrieval is Accurate Generation Cao, Bowen Cai, Deng Cui, Leyang Cheng, Xuxin Bi, Wei Zou, Yuexian Shi, Shuming Computation and Language Standard language models generate text by selecting tokens from a fixed, finite, and standalone vocabulary. We introduce a novel method that selects context-aware phrases from a collection of supporting documents. One of the most significant challenges for this paradigm shift is determining the training oracles, because a string of text can be segmented in various ways and each segment can be retrieved from numerous possible documents. To address this, we propose to initialize the training oracles using linguistic heuristics and, more importantly, bootstrap the oracles through iterative self-reinforcement. Extensive experiments show that our model not only outperforms standard language models on a variety of knowledge-intensive tasks but also demonstrates improved generation quality in open-ended text generation. For instance, compared to the standard language model counterpart, our model raises the accuracy from 23.47% to 36.27% on OpenbookQA, and improves the MAUVE score from 42.61% to 81.58% in open-ended text generation. Remarkably, our model also achieves the best performance and the lowest latency among several retrieval-augmented baselines. In conclusion, we assert that retrieval is more accurate generation and hope that our work will encourage further research on this new paradigm shift.
title	Retrieval is Accurate Generation
topic	Computation and Language
url	https://arxiv.org/abs/2402.17532

Ejemplares similares