Saved in:
| Main Authors: | An, Haozhe, Baumler, Connor, Sancheti, Abhilasha, Rudinger, Rachel |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.06792 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
On the Influence of Gender and Race in Romantic Relationship Prediction from Large Language Models
by: Sancheti, Abhilasha, et al.
Published: (2024)
by: Sancheti, Abhilasha, et al.
Published: (2024)
How much reliable is ChatGPT's prediction on Information Extraction under Input Perturbations?
by: Mondal, Ishani, et al.
Published: (2024)
by: Mondal, Ishani, et al.
Published: (2024)
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?
by: Balepur, Nishant, et al.
Published: (2024)
by: Balepur, Nishant, et al.
Published: (2024)
Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the U.S
by: Acquaye, Christabel, et al.
Published: (2024)
by: Acquaye, Christabel, et al.
Published: (2024)
Do Large Language Models Discriminate in Hiring Decisions on the Basis of Race, Ethnicity, and Gender?
by: An, Haozhe, et al.
Published: (2024)
by: An, Haozhe, et al.
Published: (2024)
Post-Hoc Answer Attribution for Grounded and Trustworthy Long Document Comprehension: Task, Insights, and Challenges
by: Sancheti, Abhilasha, et al.
Published: (2024)
by: Sancheti, Abhilasha, et al.
Published: (2024)
When Stereotypes GTG: The Impact of Predictive Text Suggestions on Gender Bias in Human-AI Co-Writing
by: Baumler, Connor, et al.
Published: (2024)
by: Baumler, Connor, et al.
Published: (2024)
Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer?
by: Balepur, Nishant, et al.
Published: (2024)
by: Balepur, Nishant, et al.
Published: (2024)
NLI under the Microscope: What Atomic Hypothesis Decomposition Reveals
by: Srikanth, Neha, et al.
Published: (2025)
by: Srikanth, Neha, et al.
Published: (2025)
Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?
by: Balepur, Nishant, et al.
Published: (2024)
by: Balepur, Nishant, et al.
Published: (2024)
Multiple LLM Agents Debate for Equitable Cultural Alignment
by: Ki, Dayeon, et al.
Published: (2025)
by: Ki, Dayeon, et al.
Published: (2025)
Everything is Plausible: Investigating the Impact of LLM Rationales on Human Notions of Plausibility
by: Palta, Shramay, et al.
Published: (2025)
by: Palta, Shramay, et al.
Published: (2025)
Take Out Your Calculators: Estimating the Real Difficulty of Question Items with LLM Student Simulations
by: Acquaye, Christabel, et al.
Published: (2026)
by: Acquaye, Christabel, et al.
Published: (2026)
Test-Time Reasoners Are Strategic Multiple-Choice Test-Takers
by: Balepur, Nishant, et al.
Published: (2025)
by: Balepur, Nishant, et al.
Published: (2025)
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination Reasoning
by: Balepur, Nishant, et al.
Published: (2023)
by: Balepur, Nishant, et al.
Published: (2023)
How often are errors in natural language reasoning due to paraphrastic variability?
by: Srikanth, Neha, et al.
Published: (2024)
by: Srikanth, Neha, et al.
Published: (2024)
Can You Make It Sound Like You? Post-Editing LLM-Generated Text for Personal Style
by: Baumler, Connor, et al.
Published: (2026)
by: Baumler, Connor, et al.
Published: (2026)
DiscoTrace: Representing and Comparing Answering Strategies of Humans and LLMs in Information-Seeking Question Answering
by: Srikanth, Neha, et al.
Published: (2026)
by: Srikanth, Neha, et al.
Published: (2026)
Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above
by: Balepur, Nishant, et al.
Published: (2025)
by: Balepur, Nishant, et al.
Published: (2025)
Language Models Predict Empathy Gaps Between Social In-groups and Out-groups
by: Hou, Yu, et al.
Published: (2025)
by: Hou, Yu, et al.
Published: (2025)
Speaking the Right Language: The Impact of Expertise Alignment in User-AI Interactions
by: Palta, Shramay, et al.
Published: (2025)
by: Palta, Shramay, et al.
Published: (2025)
Exploring Gender Bias Beyond Occupational Titles
by: Sabir, Ahmed, et al.
Published: (2025)
by: Sabir, Ahmed, et al.
Published: (2025)
Assumed Identities: Quantifying Gender Bias in Machine Translation of Gender-Ambiguous Occupational Terms
by: Mastromichalakis, Orfeas Menis, et al.
Published: (2025)
by: Mastromichalakis, Orfeas Menis, et al.
Published: (2025)
Understanding Common Ground Misalignment in Goal-Oriented Dialog: A Case-Study with Ubuntu Chat Logs
by: Sarkar, Rupak, et al.
Published: (2025)
by: Sarkar, Rupak, et al.
Published: (2025)
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them
by: Ravichander, Abhilasha, et al.
Published: (2025)
by: Ravichander, Abhilasha, et al.
Published: (2025)
LABOR-LLM: Language-Based Occupational Representations with Large Language Models
by: Athey, Susan, et al.
Published: (2024)
by: Athey, Susan, et al.
Published: (2024)
'Rich Dad, Poor Lad': How do Large Language Models Contextualize Socioeconomic Factors in College Admission ?
by: Nghiem, Huy, et al.
Published: (2025)
by: Nghiem, Huy, et al.
Published: (2025)
FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response
by: Shichman, Mollie, et al.
Published: (2025)
by: Shichman, Mollie, et al.
Published: (2025)
Plausibly Problematic Questions in Multiple-Choice Benchmarks for Commonsense Reasoning
by: Palta, Shramay, et al.
Published: (2024)
by: Palta, Shramay, et al.
Published: (2024)
Whose Boat Does it Float? Improving Personalization in Preference Tuning via Inferred User Personas
by: Balepur, Nishant, et al.
Published: (2025)
by: Balepur, Nishant, et al.
Published: (2025)
Natural Language Inference Improves Compositionality in Vision-Language Models
by: Cascante-Bonilla, Paola, et al.
Published: (2024)
by: Cascante-Bonilla, Paola, et al.
Published: (2024)
Reheat Nachos for Dinner? Evaluating AI Support for Cross-Cultural Communication of Neologisms
by: Ki, Dayeon, et al.
Published: (2026)
by: Ki, Dayeon, et al.
Published: (2026)
Multilingual large language models leak human stereotypes across language boundaries
by: Cao, Yang Trista, et al.
Published: (2023)
by: Cao, Yang Trista, et al.
Published: (2023)
Are Female Carpenters like Blue Bananas? A Corpus Investigation of Occupation Gender Typicality
by: Ju, Da, et al.
Published: (2024)
by: Ju, Da, et al.
Published: (2024)
Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias
by: Chen, Yuen, et al.
Published: (2022)
by: Chen, Yuen, et al.
Published: (2022)
Colombian Waitresses y Jueces canadienses: Gender and Country Biases in Occupation Recommendations from LLMs
by: Rodríguez, Elisa Forcada, et al.
Published: (2025)
by: Rodríguez, Elisa Forcada, et al.
Published: (2025)
SALAD: Source-free Active Label-Agnostic Domain Adaptation for Classification, Segmentation and Detection
by: Kothandaraman, Divya, et al.
Published: (2022)
by: Kothandaraman, Divya, et al.
Published: (2022)
Learning Mutually Informed Representations for Characters and Subwords
by: Wang, Yilin, et al.
Published: (2023)
by: Wang, Yilin, et al.
Published: (2023)
What Has Been Lost with Synthetic Evaluation?
by: Gill, Alexander, et al.
Published: (2025)
by: Gill, Alexander, et al.
Published: (2025)
Pregnant Questions: The Importance of Pragmatic Awareness in Maternal Health Question Answering
by: Srikanth, Neha, et al.
Published: (2023)
by: Srikanth, Neha, et al.
Published: (2023)
Similar Items
-
On the Influence of Gender and Race in Romantic Relationship Prediction from Large Language Models
by: Sancheti, Abhilasha, et al.
Published: (2024) -
How much reliable is ChatGPT's prediction on Information Extraction under Input Perturbations?
by: Mondal, Ishani, et al.
Published: (2024) -
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?
by: Balepur, Nishant, et al.
Published: (2024) -
Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the U.S
by: Acquaye, Christabel, et al.
Published: (2024) -
Do Large Language Models Discriminate in Hiring Decisions on the Basis of Race, Ethnicity, and Gender?
by: An, Haozhe, et al.
Published: (2024)