:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sheffield, William, Misra, Kanishka, Pyatkin, Valentina, Deo, Ashwini, Mahowald, Kyle, Li, Junyi Jessy
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2506.04534
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Which course? Discourse! Teaching Discourse and Generation in the Era of LLMs
by: Li, Junyi Jessy, et al.
Published: (2026)

Experimental Contexts Can Facilitate Robust Semantic Property Inference in Language Models, but Inconsistently
by: Misra, Kanishka, et al.
Published: (2024)

WUGNECTIVES: Novel Entity Inferences of Language Models from Discourse Connectives
by: Brubaker, Daniel, et al.
Published: (2025)

semantic-features: A User-Friendly Tool for Studying Contextual Word Embeddings in Interpretable Semantic Spaces
by: Ranganathan, Jwalanthi, et al.
Published: (2025)

Language Models Learn Rare Phenomena from Less Rare Phenomena: The Case of the Missing AANNs
by: Misra, Kanishka, et al.
Published: (2024)

LLMs Lean on Priors, Not Programming Language Semantics
by: Thimmaiah, Aditya, et al.
Published: (2025)

Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP
by: Rozner, Josh, et al.
Published: (2021)

Language Models Learn Constructional Semantics, Not To Mention Syntax: Investigating LM Understanding of Paired-Focus Constructions
by: Scivetti, Wesley, et al.
Published: (2026)

When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't
by: Nemitz, Jonathan, et al.
Published: (2026)

Discourse Diversity in Multi-Turn Empathic Dialogue
by: Zhan, Hongli, et al.
Published: (2026)

A systematic framework for generating novel experimental hypotheses from language models
by: Misra, Kanishka, et al.
Published: (2024)

Emergent Introspection in AI is Content-Agnostic
by: Lederman, Harvey, et al.
Published: (2026)

The Counterexample Game: Iterated Conceptual Analysis and Repair in Language Models
by: Drucker, Daniel, et al.
Published: (2026)

On Language Models' Sensitivity to Suspicious Coincidences
by: Padmanabhan, Sriram, et al.
Published: (2025)

Both Direct and Indirect Evidence Contribute to Dative Alternation Preferences in Language Models
by: Yao, Qing, et al.
Published: (2025)

Hey, wait a minute: on at-issue sensitivity in Language Models
by: Kim, Sanghee J., et al.
Published: (2025)

Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs
by: Hu, Zichao, et al.
Published: (2024)

Language Models Fail to Introspect About Their Knowledge of Language
by: Song, Siyuan, et al.
Published: (2025)

Causal Interventions Reveal Shared Structure Across English Filler-Gap Constructions
by: Boguraev, Sasha, et al.
Published: (2025)

Counterfactual Probing for the Influence of Affect and Specificity on Intergroup Bias
by: Govindarajan, Venkata S, et al.
Published: (2023)

Privileged Self-Access Matters for Introspection in AI
by: Song, Siyuan, et al.
Published: (2025)

Models Can and Should Embrace the Communicative Nature of Human-Generated Math
by: Boguraev, Sasha, et al.
Published: (2024)

Do they mean 'us'? Interpreting Referring Expressions in Intergroup Bias
by: Govindarajan, Venkata S, et al.
Published: (2024)

TurnWise: The Gap between Single- and Multi-turn Language Model Capabilities
by: Graf, Victoria, et al.
Published: (2026)

Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?
by: Yun, Hye Sun, et al.
Published: (2025)

Language models align with human judgments on key grammatical constructions
by: Hu, Jennifer, et al.
Published: (2024)

SPRI: Aligning Large Language Models with Context-Situated Principles
by: Zhan, Hongli, et al.
Published: (2025)

Cross-Modal Taxonomic Generalization in (Vision-) Language Models
by: Xu, Tianyang, et al.
Published: (2026)

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
by: Lin, Bill Yuchen, et al.
Published: (2024)

SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning
by: Limozin, Alexis, et al.
Published: (2026)

What Can String Probability Tell Us About Grammaticality?
by: Hu, Jennifer, et al.
Published: (2025)

Large Language Models Produce Responses Perceived to be Empathic
by: Lee, Yoon Kyung, et al.
Published: (2024)

Mission: Impossible Language Models
by: Kallini, Julie, et al.
Published: (2024)

Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It
by: Qin, Yulu, et al.
Published: (2025)

PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm
by: Li, Jing-Jing, et al.
Published: (2026)

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
by: Röttger, Paul, et al.
Published: (2024)

Dissociating language and thought in large language models
by: Mahowald, Kyle, et al.
Published: (2023)

This Treatment Works, Right? Evaluating LLM Sensitivity to Patient Question Framing in Medical QA
by: Yun, Hye Sun, et al.
Published: (2026)

Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information
by: Gao, Qiang, et al.
Published: (2024)

Semantic Mastery: Enhancing LLMs with Advanced Natural Language Understanding
by: Hariharan, Mohanakrishnan
Published: (2025)