Saved in:
| Main Authors: | Rao, Abhinav, Yerukola, Akhila, Shah, Vishwa, Reinecke, Katharina, Sap, Maarten |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.12464 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal Gestures
by: Yerukola, Akhila, et al.
Published: (2025)
by: Yerukola, Akhila, et al.
Published: (2025)
Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Non-Literal Intent Resolution in LLMs
by: Yerukola, Akhila, et al.
Published: (2024)
by: Yerukola, Akhila, et al.
Published: (2024)
Out of Style: RAG's Fragility to Linguistic Variation
by: Cao, Tianyu, et al.
Published: (2025)
by: Cao, Tianyu, et al.
Published: (2025)
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
by: Kumar, Priyanshu, et al.
Published: (2025)
by: Kumar, Priyanshu, et al.
Published: (2025)
Words Like Knives: Backstory-Personalized Modeling and Detection of Violent Communication
by: Shen, Jocelyn, et al.
Published: (2025)
by: Shen, Jocelyn, et al.
Published: (2025)
Social World Models
by: Zhou, Xuhui, et al.
Published: (2025)
by: Zhou, Xuhui, et al.
Published: (2025)
Ambig-SWE: Interactive Agents to Overcome Underspecificity in Software Engineering
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)
by: Vijayvargiya, Sanidhya, et al.
Published: (2025)
Data Defenses Against Large Language Models
by: Agnew, William, et al.
Published: (2024)
by: Agnew, William, et al.
Published: (2024)
Framing an AI with Values Reduces AI Reliance in AI-supported Writing Tasks
by: Gao, Alice, et al.
Published: (2026)
by: Gao, Alice, et al.
Published: (2026)
Pre-Calc: Learning to Use the Calculator Improves Numeracy in Language Models
by: Veerendranath, Vishruth, et al.
Published: (2024)
by: Veerendranath, Vishruth, et al.
Published: (2024)
From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models
by: Mendelsohn, Julia, et al.
Published: (2023)
by: Mendelsohn, Julia, et al.
Published: (2023)
PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models
by: Jain, Devansh, et al.
Published: (2024)
by: Jain, Devansh, et al.
Published: (2024)
Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty
by: Zhou, Kaitlyn, et al.
Published: (2024)
by: Zhou, Kaitlyn, et al.
Published: (2024)
Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas
by: Kwok, Louis, et al.
Published: (2024)
by: Kwok, Louis, et al.
Published: (2024)
Measuring Social Norms of Large Language Models
by: Yuan, Ye, et al.
Published: (2024)
by: Yuan, Ye, et al.
Published: (2024)
EVALUESTEER: Measuring Reward Model Steerability Towards Values and Preferences
by: Ghate, Kshitish, et al.
Published: (2025)
by: Ghate, Kshitish, et al.
Published: (2025)
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
by: Baheti, Ashutosh, et al.
Published: (2023)
by: Baheti, Ashutosh, et al.
Published: (2023)
SocialGaze: Improving the Integration of Human Social Norms in Large Language Models
by: Vijjini, Anvesh Rao, et al.
Published: (2024)
by: Vijjini, Anvesh Rao, et al.
Published: (2024)
Adaptable Logical Control for Large Language Models
by: Zhang, Honghua, et al.
Published: (2024)
by: Zhang, Honghua, et al.
Published: (2024)
Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance
by: Zhou, Kaitlyn, et al.
Published: (2024)
by: Zhou, Kaitlyn, et al.
Published: (2024)
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
by: Jiang, Liwei, et al.
Published: (2024)
by: Jiang, Liwei, et al.
Published: (2024)
Not Like Us, Hunty: Measuring Perceptions and Behavioral Effects of Minoritized Anthropomorphic Cues in LLMs
by: Basoah, Jeffrey, et al.
Published: (2025)
by: Basoah, Jeffrey, et al.
Published: (2025)
Belief Revision: The Adaptability of Large Language Models Reasoning
by: Wilie, Bryan, et al.
Published: (2024)
by: Wilie, Bryan, et al.
Published: (2024)
Mitigating Bias in RAG: Controlling the Embedder
by: Kim, Taeyoun, et al.
Published: (2025)
by: Kim, Taeyoun, et al.
Published: (2025)
Graph-Assisted Culturally Adaptable Idiomatic Translation for Indic Languages
by: Singh, Pratik Rakesh, et al.
Published: (2025)
by: Singh, Pratik Rakesh, et al.
Published: (2025)
Adaptable and Reliable Text Classification using Large Language Models
by: Wang, Zhiqiang, et al.
Published: (2024)
by: Wang, Zhiqiang, et al.
Published: (2024)
Stereotype or Personalization? User Identity Biases Chatbot Recommendations
by: Kantharuban, Anjali, et al.
Published: (2024)
by: Kantharuban, Anjali, et al.
Published: (2024)
Where Do People Tell Stories Online? Story Detection Across Online Communities
by: Antoniak, Maria, et al.
Published: (2023)
by: Antoniak, Maria, et al.
Published: (2023)
CDEval: A Benchmark for Measuring the Cultural Dimensions of Large Language Models
by: Wang, Yuhang, et al.
Published: (2023)
by: Wang, Yuhang, et al.
Published: (2023)
Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning
by: He, Zhonghao, et al.
Published: (2025)
by: He, Zhonghao, et al.
Published: (2025)
Rejected Dialects: Biases Against African American Language in Reward Models
by: Mire, Joel, et al.
Published: (2025)
by: Mire, Joel, et al.
Published: (2025)
Building and Measuring Trust between Large Language Models
by: Buyl, Maarten, et al.
Published: (2025)
by: Buyl, Maarten, et al.
Published: (2025)
SA-MDKIF: A Scalable and Adaptable Medical Domain Knowledge Injection Framework for Large Language Models
by: Xu, Tianhan, et al.
Published: (2024)
by: Xu, Tianhan, et al.
Published: (2024)
VideoNorms: Benchmarking Cultural Awareness of Video Language Models
by: Varimalla, Nikhil Reddy, et al.
Published: (2025)
by: Varimalla, Nikhil Reddy, et al.
Published: (2025)
HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs
by: Shen, Jocelyn, et al.
Published: (2024)
by: Shen, Jocelyn, et al.
Published: (2024)
Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
by: Jiang, Liwei, et al.
Published: (2025)
by: Jiang, Liwei, et al.
Published: (2025)
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory
by: Mireshghallah, Niloofar, et al.
Published: (2023)
by: Mireshghallah, Niloofar, et al.
Published: (2023)
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents
by: Wang, Ruiyi, et al.
Published: (2024)
by: Wang, Ruiyi, et al.
Published: (2024)
Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs
by: Zhou, Xuhui, et al.
Published: (2024)
by: Zhou, Xuhui, et al.
Published: (2024)
Breaking mBad! Supervised Fine-tuning for Cross-Lingual Detoxification
by: Beniwal, Himanshu, et al.
Published: (2025)
by: Beniwal, Himanshu, et al.
Published: (2025)
Similar Items
-
Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal Gestures
by: Yerukola, Akhila, et al.
Published: (2025) -
Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Non-Literal Intent Resolution in LLMs
by: Yerukola, Akhila, et al.
Published: (2024) -
Out of Style: RAG's Fragility to Linguistic Variation
by: Cao, Tianyu, et al.
Published: (2025) -
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
by: Kumar, Priyanshu, et al.
Published: (2025) -
Words Like Knives: Backstory-Personalized Modeling and Detection of Violent Communication
by: Shen, Jocelyn, et al.
Published: (2025)