Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Salari, Elmira, Delfino, Maria Claudia Nunes, Amamou, Hazem, de Souza, José Victor, Kshirsagar, Shruti, Davoust, Alan, Avila, Anderson
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2603.14838
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915865355616256
author	Salari, Elmira Delfino, Maria Claudia Nunes Amamou, Hazem de Souza, José Victor Kshirsagar, Shruti Davoust, Alan Avila, Anderson
author_facet	Salari, Elmira Delfino, Maria Claudia Nunes Amamou, Hazem de Souza, José Victor Kshirsagar, Shruti Davoust, Alan Avila, Anderson
contents	This paper studies the impact of retrieved ideological texts on the outputs of large language models (LLMs). While interest in understanding ideology in LLMs has recently increased, little attention has been given to this issue in the context of Retrieval-Augmented Generation (RAG). To fill this gap, we design an external knowledge source based on ideological loaded texts about COVID-19 treatments. Our corpus is based on 1,117 academic articles representing discourses about controversial and endorsed treatments for the disease. We propose a corpus linguistics framework, based on Lexical Multidimensional Analysis (LMDA), to identify the ideologies within the corpus. LLMs are tasked to answer questions derived from three identified ideological dimensions, and two types of contextual prompts are adopted: the first comprises the user question and ideological texts; and the second contains the question, ideological texts, and LMDA descriptions. Ideological alignment between reference ideological texts and LLMs' responses is assessed using cosine similarity for lexical and semantic representations. Results demonstrate that LLMs' responses based on ideological retrieved texts are more aligned with the ideology encountered in the external knowledge, with the enhanced prompt further influencing LLMs' outputs. Our findings highlight the importance of identifying ideological discourses within the RAG framework in order to mitigate not just unintended ideological bias, but also the risks of malicious manipulation of such models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_14838
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	The Impact of Ideological Discourses in RAG: A Case Study with COVID-19 Treatments Salari, Elmira Delfino, Maria Claudia Nunes Amamou, Hazem de Souza, José Victor Kshirsagar, Shruti Davoust, Alan Avila, Anderson Computation and Language This paper studies the impact of retrieved ideological texts on the outputs of large language models (LLMs). While interest in understanding ideology in LLMs has recently increased, little attention has been given to this issue in the context of Retrieval-Augmented Generation (RAG). To fill this gap, we design an external knowledge source based on ideological loaded texts about COVID-19 treatments. Our corpus is based on 1,117 academic articles representing discourses about controversial and endorsed treatments for the disease. We propose a corpus linguistics framework, based on Lexical Multidimensional Analysis (LMDA), to identify the ideologies within the corpus. LLMs are tasked to answer questions derived from three identified ideological dimensions, and two types of contextual prompts are adopted: the first comprises the user question and ideological texts; and the second contains the question, ideological texts, and LMDA descriptions. Ideological alignment between reference ideological texts and LLMs' responses is assessed using cosine similarity for lexical and semantic representations. Results demonstrate that LLMs' responses based on ideological retrieved texts are more aligned with the ideology encountered in the external knowledge, with the enhanced prompt further influencing LLMs' outputs. Our findings highlight the importance of identifying ideological discourses within the RAG framework in order to mitigate not just unintended ideological bias, but also the risks of malicious manipulation of such models.
title	The Impact of Ideological Discourses in RAG: A Case Study with COVID-19 Treatments
topic	Computation and Language
url	https://arxiv.org/abs/2603.14838

Similar Items