:: Library Catalog

Image de couverture de livre

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Dong, Wenchao, Zhunis, Assem, Jeong, Dongyoung, Chin, Hyojin, Han, Jiyoung, Cha, Meeyoung
Format:	Preprint
Publié:	2024
Sujets:	Computation and Language
Accès en ligne:	https://arxiv.org/abs/2409.03843
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Documents similaires

I Am Not Them: Fluid Identities and Persistent Out-group Bias in Large Language Models
par: Dong, Wenchao, et autres
Publié: (2024)

Exploring Persona-dependent LLM Alignment for the Moral Machine Experiment
par: Kim, Jiseon, et autres
Publié: (2025)

Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions
par: Kim, Jiseon, et autres
Publié: (2026)

Adversarial Style Augmentation via Large Language Model for Robust Fake News Detection
par: Park, Sungwon, et autres
Publié: (2024)

EconCausal: A Context-Aware Economic Reasoning Benchmark for Large Language Models
par: Lee, Donggyu, et autres
Publié: (2025)

Evaluating Large Language Model Biases in Persona-Steered Generation
par: Liu, Andy, et autres
Publié: (2024)

Characterizing AI Manipulation Risks in Brazilian YouTube Climate Discourse
par: Dong, Wenchao, et autres
Publié: (2025)

How You Ask Matters! Adaptive RAG Robustness to Query Variations
par: Jang, Yunah, et autres
Publié: (2026)

How Training Data Shapes the Use of Parametric and In-Context Knowledge in Language Models
par: Kim, Minsung, et autres
Publié: (2025)

Generative Language Models Exhibit Social Identity Biases
par: Hu, Tiancheng, et autres
Publié: (2023)

Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment
par: Kwon, Jea, et autres
Publié: (2025)

Invisible Influences: Investigating Implicit Intersectional Biases through Persona Engineering in Large Language Models
par: Arimanda, Nandini, et autres
Publié: (2026)

JBBQ: Japanese Bias Benchmark for Analyzing Social Biases in Large Language Models
par: Yanaka, Hitomi, et autres
Publié: (2024)

Enhancing Contextual Understanding in Large Language Models through Contrastive Decoding
par: Zhao, Zheng, et autres
Publié: (2024)

KLAAD: Refining Attention Mechanisms to Reduce Societal Bias in Generative Language Models
par: Kim, Seorin, et autres
Publié: (2025)

Persona Jailbreaking in Large Language Models
par: Sandhan, Jivnesh, et autres
Publié: (2026)

BiasCause: Evaluate Socially Biased Causal Reasoning of Large Language Models
par: Xie, Tian, et autres
Publié: (2025)

Large Language Models for Stemming: Promises, Pitfalls and Failures
par: Wang, Shuai, et autres
Publié: (2024)

Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
par: Gupta, Shashank, et autres
Publié: (2023)

Entangled in Representations: Mechanistic Investigation of Cultural Biases in Large Language Models
par: Yu, Haeun, et autres
Publié: (2025)

Editing the Mind of Giants: An In-Depth Exploration of Pitfalls of Knowledge Editing in Large Language Models
par: Hsueh, Cheng-Hsun, et autres
Publié: (2024)

Intent-Aware Self-Correction for Mitigating Social Biases in Large Language Models
par: Anantaprayoon, Panatchakorn, et autres
Publié: (2025)

Whose Journey Matters? Investigating Identity Biases in Large Language Models (LLMs) for Travel Planning Assistance
par: Ren, Ruiping, et autres
Publié: (2024)

Pitfalls of Scale: Investigating the Inverse Task of Redefinition in Large Language Models
par: Stringli, Elena, et autres
Publié: (2025)

ABCD: All Biases Come Disguised
par: Nowak, Mateusz, et autres
Publié: (2026)

BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla
par: Kamruzzaman, Mahammed, et autres
Publié: (2024)

Stereotype or Personalization? User Identity Biases Chatbot Recommendations
par: Kantharuban, Anjali, et autres
Publié: (2024)

Guard Vector: Beyond English LLM Guardrails with Task-Vector Composition and Streaming-Aware Prefix SFT
par: Lee, Wonhyuk, et autres
Publié: (2025)

Large Language Models Develop Novel Social Biases Through Adaptive Exploration
par: Wu, Addison J., et autres
Publié: (2025)

Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions
par: de Araujo, Pedro Henrique Luz, et autres
Publié: (2025)

Large Language Models are Biased Because They Are Large Language Models
par: Resnik, Philip
Publié: (2024)

Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning
par: Kim, Jonathan, et autres
Publié: (2025)

Mitigating Social Biases in Language Models through Unlearning
par: Dige, Omkar, et autres
Publié: (2024)

Unveiling the Pitfalls of Knowledge Editing for Large Language Models
par: Li, Zhoubo, et autres
Publié: (2023)

One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models
par: Fein, Daniel, et autres
Publié: (2026)

K/DA: Automated Data Generation Pipeline for Detoxifying Implicitly Offensive Language in Korean
par: Jeon, Minkyeong, et autres
Publié: (2025)

Pitfalls of Evaluating Language Models with Open Benchmarks
par: Hasan, Md. Najib, et autres
Publié: (2025)

Measuring Stereotype and Deviation Biases in Large Language Models
par: Wang, Daniel, et autres
Publié: (2025)

Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization
par: Shin, Sungbin, et autres
Publié: (2024)

Not All Personas Are Worth It: Culture-Reflective Persona Data Augmentation
par: Han, Ji-Eun, et autres
Publié: (2025)