Saved in:
| Main Authors: | Kankowski, Florian, Solstad, Torgrim, Zarriess, Sina, Bott, Oliver |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.12980 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models
by: Sieker, Judith, et al.
Published: (2026)
by: Sieker, Judith, et al.
Published: (2026)
LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High
by: Sieker, Judith, et al.
Published: (2025)
by: Sieker, Judith, et al.
Published: (2025)
SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts
by: Brinner, Marc, et al.
Published: (2025)
by: Brinner, Marc, et al.
Published: (2025)
Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs
by: Lachenmaier, Clara, et al.
Published: (2026)
by: Lachenmaier, Clara, et al.
Published: (2026)
Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions
by: Lachenmaier, Clara, et al.
Published: (2025)
by: Lachenmaier, Clara, et al.
Published: (2025)
Subword models struggle with word learning, but surprisal hides it
by: Bunzeck, Bastian, et al.
Published: (2025)
by: Bunzeck, Bastian, et al.
Published: (2025)
SceneGram: Conceptualizing and Describing Tangrams in Scene Context
by: Junker, Simeon, et al.
Published: (2025)
by: Junker, Simeon, et al.
Published: (2025)
Child-directed speech facilitates production, not comprehension, in BabyLMs
by: Bunzeck, Bastian, et al.
Published: (2026)
by: Bunzeck, Bastian, et al.
Published: (2026)
The Frequency Confound in Language-Model Surprisal and Metaphor Novelty
by: Momen, Omar, et al.
Published: (2026)
by: Momen, Omar, et al.
Published: (2026)
Resilience through Scene Context in Visual Referring Expression Generation
by: Junker, Simeon, et al.
Published: (2024)
by: Junker, Simeon, et al.
Published: (2024)
Rationalizing Transformer Predictions via End-To-End Differentiable Self-Training
by: Brinner, Marc, et al.
Published: (2025)
by: Brinner, Marc, et al.
Published: (2025)
Model Interpretability and Rationale Extraction by Input Mask Optimization
by: Brinner, Marc, et al.
Published: (2025)
by: Brinner, Marc, et al.
Published: (2025)
Efficient Scientific Full Text Classification: The Case of EICAT Impact Assessments
by: Brinner, Marc Felix, et al.
Published: (2025)
by: Brinner, Marc Felix, et al.
Published: (2025)
Enhancing Domain-Specific Encoder Models with LLM-Generated Data: How to Leverage Ontologies, and How to Do Without Them
by: Brinner, Marc, et al.
Published: (2025)
by: Brinner, Marc, et al.
Published: (2025)
SemCSE-Multi: Multifaceted and Decodable Embeddings for Aspect-Specific and Interpretable Scientific Domain Mapping
by: Brinner, Marc, et al.
Published: (2025)
by: Brinner, Marc, et al.
Published: (2025)
Do Construction Distributions Shape Formal Language Learning In German BabyLMs?
by: Bunzeck, Bastian, et al.
Published: (2025)
by: Bunzeck, Bastian, et al.
Published: (2025)
Reference Games as a Testbed for the Alignment of Model Uncertainty and Clarification Requests
by: Ali, Manar, et al.
Published: (2026)
by: Ali, Manar, et al.
Published: (2026)
Are BabyLMs Deaf to Gricean Maxims? A Pragmatic Evaluation of Sample-efficient Language Models
by: Askari, Raha, et al.
Published: (2025)
by: Askari, Raha, et al.
Published: (2025)
Evaluating Diversity in Automatic Poetry Generation
by: Chen, Yanran, et al.
Published: (2024)
by: Chen, Yanran, et al.
Published: (2024)
Small Language Models Also Work With Small Vocabularies: Probing the Linguistic Abilities of Grapheme- and Phoneme-Based Baby Llamas
by: Bunzeck, Bastian, et al.
Published: (2024)
by: Bunzeck, Bastian, et al.
Published: (2024)
The InviTE Corpus: Annotating Invectives in Tudor English Texts for Computational Modeling
by: Spliethoff, Sophie, et al.
Published: (2025)
by: Spliethoff, Sophie, et al.
Published: (2025)
Are Multimodal Large Language Models Pragmatically Competent Listeners in Simple Reference Resolution Tasks?
by: Junker, Simeon, et al.
Published: (2025)
by: Junker, Simeon, et al.
Published: (2025)
Surprisal and Metaphor Novelty Judgments: Moderate Correlations and Divergent Scaling Effects Revealed by Corpus-Based and Synthetic Datasets
by: Momen, Omar, et al.
Published: (2026)
by: Momen, Omar, et al.
Published: (2026)
AIDBench: A benchmark for evaluating the authorship identification capability of large language models
by: Wen, Zichen, et al.
Published: (2024)
by: Wen, Zichen, et al.
Published: (2024)
LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains
by: Hernandes, Raphael, et al.
Published: (2024)
by: Hernandes, Raphael, et al.
Published: (2024)
Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning)
by: Padovani, Francesca, et al.
Published: (2025)
by: Padovani, Francesca, et al.
Published: (2025)
CausalGraph2LLM: Evaluating LLMs for Causal Queries
by: Sheth, Ivaxi, et al.
Published: (2024)
by: Sheth, Ivaxi, et al.
Published: (2024)
The Illusion of Competence: Evaluating the Effect of Explanations on Users' Mental Models of Visual Question Answering Systems
by: Sieker, Judith, et al.
Published: (2024)
by: Sieker, Judith, et al.
Published: (2024)
Do LLMs exhibit human-like response biases? A case study in survey design
by: Tjuatja, Lindia, et al.
Published: (2023)
by: Tjuatja, Lindia, et al.
Published: (2023)
A word association network methodology for evaluating implicit biases in LLMs compared to humans
by: Abramski, Katherine, et al.
Published: (2025)
by: Abramski, Katherine, et al.
Published: (2025)
Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain
by: Hu, Xiaoyu, et al.
Published: (2026)
by: Hu, Xiaoyu, et al.
Published: (2026)
Towards Trustworthy Lexical Simplification: Exploring Safety and Efficiency with Small LLMs
by: Hayakawa, Akio, et al.
Published: (2025)
by: Hayakawa, Akio, et al.
Published: (2025)
Assessing LLM Reasoning Through Implicit Causal Chain Discovery in Climate Discourse
by: Allein, Liesbeth, et al.
Published: (2025)
by: Allein, Liesbeth, et al.
Published: (2025)
An Expert-grounded benchmark of General Purpose LLMs in LCA
by: Donaldson, Artur, et al.
Published: (2025)
by: Donaldson, Artur, et al.
Published: (2025)
Do LLMs exhibit the same commonsense capabilities across languages?
by: Martínez-Murillo, Ivan, et al.
Published: (2025)
by: Martínez-Murillo, Ivan, et al.
Published: (2025)
GerPS-Compare: Comparing NER methods for legal norm analysis
by: Bachinger, Sarah T., et al.
Published: (2024)
by: Bachinger, Sarah T., et al.
Published: (2024)
Explainable Detection of Implicit Influential Patterns in Conversations via Data Augmentation
by: Abdidizaji, Sina, et al.
Published: (2025)
by: Abdidizaji, Sina, et al.
Published: (2025)
Mining for Species, Locations, Habitats, and Ecosystems from Scientific Papers in Invasion Biology: A Large-Scale Exploratory Study with Large Language Models
by: D'Souza, Jennifer, et al.
Published: (2025)
by: D'Souza, Jennifer, et al.
Published: (2025)
The role of System 1 and System 2 semantic memory structure in human and LLM biases
by: Abramski, Katherine, et al.
Published: (2026)
by: Abramski, Katherine, et al.
Published: (2026)
Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs
by: Haq, Saiful, et al.
Published: (2025)
by: Haq, Saiful, et al.
Published: (2025)
Similar Items
-
How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models
by: Sieker, Judith, et al.
Published: (2026) -
LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are High
by: Sieker, Judith, et al.
Published: (2025) -
SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts
by: Brinner, Marc, et al.
Published: (2025) -
Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs
by: Lachenmaier, Clara, et al.
Published: (2026) -
Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions
by: Lachenmaier, Clara, et al.
Published: (2025)