Salvato in:
Dettagli Bibliografici
Autore principale: Tavella, Danilo
Natura: Recurso digital
Lingua:inglese
Pubblicazione: Zenodo 2026
Soggetti:
Accesso online:https://doi.org/10.5281/zenodo.19944708
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866901308259172352
author Tavella, Danilo
author_facet Tavella, Danilo
contents <div>This technical note presents a minimal stress test for referential binding in small instruction-tuned language models.</p></div> <div> </div> <div>The experiment uses a controlled synthetic dataset of 60 queries divided into three balanced families: entity binding, relational binding, and compositional binding. In all cases, the information required to answer correctly is explicitly present in the prompt. The goal is therefore not to test knowledge retrieval, but to test whether a model can select the correct referential target from a valid support set.</p></div> <div> </div> <div>Three metrics are used: Atomic Support Score (ASS), Binding Legitimacy Score (BLS), and Illegitimate Binding Rate (IBR). The results show that models can remain inside the valid support set while selecting the wrong relational or compositional target.</p></div> <div> </div> <div>The note reports results for Qwen/Qwen2.5-1.5B-Instruct and HuggingFaceTB/SmolLM2-1.7B-Instruct, with a deterministic repeat run for Qwen. The accompanying package includes result CSV files, summary tables, requirements, README, and a reproducibility script.</p></div> <div> </div> <div>The contribution is intended as a narrow diagnostic stress test. It does not propose a general theory of grounding or a general benchmark for language models.</p></div>
format Recurso digital
id zenodo_https___doi_org_10_5281_zenodo_19944708
institution Zenodo
language eng
publishDate 2026
publisher Zenodo
record_format zenodo
spellingShingle Referential Binding Stress Test: Supported Fragments Do Not Guarantee Referential Legitimacy in Small Instruction-Tuned Language Models
Tavella, Danilo
Referential Binding
Language Model Evaluation
Instruction-Tuned Language Models
Grounding Evaluation
Synthetic Stress Test
Binding Failure
Relational Binding
Compositional Binding
Valid Support Set
Hallucination Analysis
AI Evaluation
LLM Evaluation
<div>This technical note presents a minimal stress test for referential binding in small instruction-tuned language models.</p></div> <div> </div> <div>The experiment uses a controlled synthetic dataset of 60 queries divided into three balanced families: entity binding, relational binding, and compositional binding. In all cases, the information required to answer correctly is explicitly present in the prompt. The goal is therefore not to test knowledge retrieval, but to test whether a model can select the correct referential target from a valid support set.</p></div> <div> </div> <div>Three metrics are used: Atomic Support Score (ASS), Binding Legitimacy Score (BLS), and Illegitimate Binding Rate (IBR). The results show that models can remain inside the valid support set while selecting the wrong relational or compositional target.</p></div> <div> </div> <div>The note reports results for Qwen/Qwen2.5-1.5B-Instruct and HuggingFaceTB/SmolLM2-1.7B-Instruct, with a deterministic repeat run for Qwen. The accompanying package includes result CSV files, summary tables, requirements, README, and a reproducibility script.</p></div> <div> </div> <div>The contribution is intended as a narrow diagnostic stress test. It does not propose a general theory of grounding or a general benchmark for language models.</p></div>
title Referential Binding Stress Test: Supported Fragments Do Not Guarantee Referential Legitimacy in Small Instruction-Tuned Language Models
topic Referential Binding
Language Model Evaluation
Instruction-Tuned Language Models
Grounding Evaluation
Synthetic Stress Test
Binding Failure
Relational Binding
Compositional Binding
Valid Support Set
Hallucination Analysis
AI Evaluation
LLM Evaluation
url https://doi.org/10.5281/zenodo.19944708