:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	An, Haozhe, Baumler, Connor, Sancheti, Abhilasha, Rudinger, Rachel
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2503.06792
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

On the Influence of Gender and Race in Romantic Relationship Prediction from Large Language Models
by: Sancheti, Abhilasha, et al.
Published: (2024)

How much reliable is ChatGPT's prediction on Information Extraction under Input Perturbations?
by: Mondal, Ishani, et al.
Published: (2024)

Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?
by: Balepur, Nishant, et al.
Published: (2024)

Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the U.S
by: Acquaye, Christabel, et al.
Published: (2024)

Do Large Language Models Discriminate in Hiring Decisions on the Basis of Race, Ethnicity, and Gender?
by: An, Haozhe, et al.
Published: (2024)

Post-Hoc Answer Attribution for Grounded and Trustworthy Long Document Comprehension: Task, Insights, and Challenges
by: Sancheti, Abhilasha, et al.
Published: (2024)

When Stereotypes GTG: The Impact of Predictive Text Suggestions on Gender Bias in Human-AI Co-Writing
by: Baumler, Connor, et al.
Published: (2024)

Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer?
by: Balepur, Nishant, et al.
Published: (2024)

NLI under the Microscope: What Atomic Hypothesis Decomposition Reveals
by: Srikanth, Neha, et al.
Published: (2025)

Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?
by: Balepur, Nishant, et al.
Published: (2024)

Multiple LLM Agents Debate for Equitable Cultural Alignment
by: Ki, Dayeon, et al.
Published: (2025)

Everything is Plausible: Investigating the Impact of LLM Rationales on Human Notions of Plausibility
by: Palta, Shramay, et al.
Published: (2025)

Take Out Your Calculators: Estimating the Real Difficulty of Question Items with LLM Student Simulations
by: Acquaye, Christabel, et al.
Published: (2026)

Test-Time Reasoners Are Strategic Multiple-Choice Test-Takers
by: Balepur, Nishant, et al.
Published: (2025)

It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination Reasoning
by: Balepur, Nishant, et al.
Published: (2023)

How often are errors in natural language reasoning due to paraphrastic variability?
by: Srikanth, Neha, et al.
Published: (2024)

Can You Make It Sound Like You? Post-Editing LLM-Generated Text for Personal Style
by: Baumler, Connor, et al.
Published: (2026)

DiscoTrace: Representing and Comparing Answering Strategies of Humans and LLMs in Information-Seeking Question Answering
by: Srikanth, Neha, et al.
Published: (2026)

Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above
by: Balepur, Nishant, et al.
Published: (2025)

Language Models Predict Empathy Gaps Between Social In-groups and Out-groups
by: Hou, Yu, et al.
Published: (2025)

Speaking the Right Language: The Impact of Expertise Alignment in User-AI Interactions
by: Palta, Shramay, et al.
Published: (2025)

Exploring Gender Bias Beyond Occupational Titles
by: Sabir, Ahmed, et al.
Published: (2025)

Assumed Identities: Quantifying Gender Bias in Machine Translation of Gender-Ambiguous Occupational Terms
by: Mastromichalakis, Orfeas Menis, et al.
Published: (2025)

Understanding Common Ground Misalignment in Goal-Oriented Dialog: A Case-Study with Ubuntu Chat Logs
by: Sarkar, Rupak, et al.
Published: (2025)

HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
by: Ravichander, Abhilasha, et al.
Published: (2025)

LABOR-LLM: Language-Based Occupational Representations with Large Language Models
by: Athey, Susan, et al.
Published: (2024)

'Rich Dad, Poor Lad': How do Large Language Models Contextualize Socioeconomic Factors in College Admission ?
by: Nghiem, Huy, et al.
Published: (2025)

FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response
by: Shichman, Mollie, et al.
Published: (2025)

Plausibly Problematic Questions in Multiple-Choice Benchmarks for Commonsense Reasoning
by: Palta, Shramay, et al.
Published: (2024)

Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas
by: Balepur, Nishant, et al.
Published: (2025)

Natural Language Inference Improves Compositionality in Vision-Language Models
by: Cascante-Bonilla, Paola, et al.
Published: (2024)

Reheat Nachos for Dinner? Evaluating AI Support for Cross-Cultural Communication of Neologisms
by: Ki, Dayeon, et al.
Published: (2026)

Multilingual large language models leak human stereotypes across language boundaries
by: Cao, Yang Trista, et al.
Published: (2023)

Are Female Carpenters like Blue Bananas? A Corpus Investigation of Occupation Gender Typicality
by: Ju, Da, et al.
Published: (2024)

Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias
by: Chen, Yuen, et al.
Published: (2022)

Colombian Waitresses y Jueces canadienses: Gender and Country Biases in Occupation Recommendations from LLMs
by: Rodríguez, Elisa Forcada, et al.
Published: (2025)

SALAD: Source-free Active Label-Agnostic Domain Adaptation for Classification, Segmentation and Detection
by: Kothandaraman, Divya, et al.
Published: (2022)

Learning Mutually Informed Representations for Characters and Subwords
by: Wang, Yilin, et al.
Published: (2023)

What Has Been Lost with Synthetic Evaluation?
by: Gill, Alexander, et al.
Published: (2025)

Pregnant Questions: The Importance of Pragmatic Awareness in Maternal Health Question Answering
by: Srikanth, Neha, et al.
Published: (2023)