:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kankowski, Florian, Solstad, Torgrim, Zarriess, Sina, Bott, Oliver
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2501.12980
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models
by: Sieker, Judith, et al.
Published: (2026)

LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High
by: Sieker, Judith, et al.
Published: (2025)

SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts
by: Brinner, Marc, et al.
Published: (2025)

Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs
by: Lachenmaier, Clara, et al.
Published: (2026)

Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions
by: Lachenmaier, Clara, et al.
Published: (2025)

Subword models struggle with word learning, but surprisal hides it
by: Bunzeck, Bastian, et al.
Published: (2025)

SceneGram: Conceptualizing and Describing Tangrams in Scene Context
by: Junker, Simeon, et al.
Published: (2025)

Child-directed speech facilitates production, not comprehension, in BabyLMs
by: Bunzeck, Bastian, et al.
Published: (2026)

The Frequency Confound in Language-Model Surprisal and Metaphor Novelty
by: Momen, Omar, et al.
Published: (2026)

Resilience through Scene Context in Visual Referring Expression Generation
by: Junker, Simeon, et al.
Published: (2024)

Rationalizing Transformer Predictions via End-To-End Differentiable Self-Training
by: Brinner, Marc, et al.
Published: (2025)

Model Interpretability and Rationale Extraction by Input Mask Optimization
by: Brinner, Marc, et al.
Published: (2025)

Efficient Scientific Full Text Classification: The Case of EICAT Impact Assessments
by: Brinner, Marc Felix, et al.
Published: (2025)

Enhancing Domain-Specific Encoder Models with LLM-Generated Data: How to Leverage Ontologies, and How to Do Without Them
by: Brinner, Marc, et al.
Published: (2025)

SemCSE-Multi: Multifaceted and Decodable Embeddings for Aspect-Specific and Interpretable Scientific Domain Mapping
by: Brinner, Marc, et al.
Published: (2025)

Do Construction Distributions Shape Formal Language Learning In German BabyLMs?
by: Bunzeck, Bastian, et al.
Published: (2025)

Reference Games as a Testbed for the Alignment of Model Uncertainty and Clarification Requests
by: Ali, Manar, et al.
Published: (2026)

Are BabyLMs Deaf to Gricean Maxims? A Pragmatic Evaluation of Sample-efficient Language Models
by: Askari, Raha, et al.
Published: (2025)

Evaluating Diversity in Automatic Poetry Generation
by: Chen, Yanran, et al.
Published: (2024)

Small Language Models Also Work With Small Vocabularies: Probing the Linguistic Abilities of Grapheme- and Phoneme-Based Baby Llamas
by: Bunzeck, Bastian, et al.
Published: (2024)

The InviTE Corpus: Annotating Invectives in Tudor English Texts for Computational Modeling
by: Spliethoff, Sophie, et al.
Published: (2025)

Are Multimodal Large Language Models Pragmatically Competent Listeners in Simple Reference Resolution Tasks?
by: Junker, Simeon, et al.
Published: (2025)

Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets
by: Momen, Omar, et al.
Published: (2026)

AIDBench: A benchmark for evaluating the authorship identification capability of large language models
by: Wen, Zichen, et al.
Published: (2024)

LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains
by: Hernandes, Raphael, et al.
Published: (2024)

Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning)
by: Padovani, Francesca, et al.
Published: (2025)

CausalGraph2LLM: Evaluating LLMs for Causal Queries
by: Sheth, Ivaxi, et al.
Published: (2024)

The Illusion of Competence: Evaluating the Effect of Explanations on Users' Mental Models of Visual Question Answering Systems
by: Sieker, Judith, et al.
Published: (2024)

Do LLMs exhibit human-like response biases? A case study in survey design
by: Tjuatja, Lindia, et al.
Published: (2023)

A word association network methodology for evaluating implicit biases in LLMs compared to humans
by: Abramski, Katherine, et al.
Published: (2025)

Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain
by: Hu, Xiaoyu, et al.
Published: (2026)

Towards Trustworthy Lexical Simplification: Exploring Safety and Efficiency with Small LLMs
by: Hayakawa, Akio, et al.
Published: (2025)

Assessing LLM Reasoning Through Implicit Causal Chain Discovery in Climate Discourse
by: Allein, Liesbeth, et al.
Published: (2025)

An Expert-grounded benchmark of General Purpose LLMs in LCA
by: Donaldson, Artur, et al.
Published: (2025)

Do LLMs exhibit the same commonsense capabilities across languages?
by: Martínez-Murillo, Ivan, et al.
Published: (2025)

GerPS-Compare: Comparing NER methods for legal norm analysis
by: Bachinger, Sarah T., et al.
Published: (2024)

Explainable Detection of Implicit Influential Patterns in Conversations via Data Augmentation
by: Abdidizaji, Sina, et al.
Published: (2025)

Mining for Species, Locations, Habitats, and Ecosystems from Scientific Papers in Invasion Biology: A Large-Scale Exploratory Study with Large Language Models
by: D'Souza, Jennifer, et al.
Published: (2025)

The role of System 1 and System 2 semantic memory structure in human and LLM biases
by: Abramski, Katherine, et al.
Published: (2026)

Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs
by: Haq, Saiful, et al.
Published: (2025)