Salvato in:
| Autore principale: | |
|---|---|
| Natura: | Recurso digital |
| Lingua: | inglese |
| Pubblicazione: |
Zenodo
2026
|
| Soggetti: | |
| Accesso online: | https://doi.org/10.5281/zenodo.19944708 |
| Tags: |
Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
| _version_ | 1866901308259172352 |
|---|---|
| author | Tavella, Danilo |
| author_facet | Tavella, Danilo |
| contents | <div>This technical note presents a minimal stress test for referential binding in small instruction-tuned language models.</p></div> <div> </div> <div>The experiment uses a controlled synthetic dataset of 60 queries divided into three balanced families: entity binding, relational binding, and compositional binding. In all cases, the information required to answer correctly is explicitly present in the prompt. The goal is therefore not to test knowledge retrieval, but to test whether a model can select the correct referential target from a valid support set.</p></div> <div> </div> <div>Three metrics are used: Atomic Support Score (ASS), Binding Legitimacy Score (BLS), and Illegitimate Binding Rate (IBR). The results show that models can remain inside the valid support set while selecting the wrong relational or compositional target.</p></div> <div> </div> <div>The note reports results for Qwen/Qwen2.5-1.5B-Instruct and HuggingFaceTB/SmolLM2-1.7B-Instruct, with a deterministic repeat run for Qwen. The accompanying package includes result CSV files, summary tables, requirements, README, and a reproducibility script.</p></div> <div> </div> <div>The contribution is intended as a narrow diagnostic stress test. It does not propose a general theory of grounding or a general benchmark for language models.</p></div> |
| format | Recurso digital |
| id | zenodo_https___doi_org_10_5281_zenodo_19944708 |
| institution | Zenodo |
| language | eng |
| publishDate | 2026 |
| publisher | Zenodo |
| record_format | zenodo |
| spellingShingle | Referential Binding Stress Test: Supported Fragments Do Not Guarantee Referential Legitimacy in Small Instruction-Tuned Language Models Tavella, Danilo Referential Binding Language Model Evaluation Instruction-Tuned Language Models Grounding Evaluation Synthetic Stress Test Binding Failure Relational Binding Compositional Binding Valid Support Set Hallucination Analysis AI Evaluation LLM Evaluation <div>This technical note presents a minimal stress test for referential binding in small instruction-tuned language models.</p></div> <div> </div> <div>The experiment uses a controlled synthetic dataset of 60 queries divided into three balanced families: entity binding, relational binding, and compositional binding. In all cases, the information required to answer correctly is explicitly present in the prompt. The goal is therefore not to test knowledge retrieval, but to test whether a model can select the correct referential target from a valid support set.</p></div> <div> </div> <div>Three metrics are used: Atomic Support Score (ASS), Binding Legitimacy Score (BLS), and Illegitimate Binding Rate (IBR). The results show that models can remain inside the valid support set while selecting the wrong relational or compositional target.</p></div> <div> </div> <div>The note reports results for Qwen/Qwen2.5-1.5B-Instruct and HuggingFaceTB/SmolLM2-1.7B-Instruct, with a deterministic repeat run for Qwen. The accompanying package includes result CSV files, summary tables, requirements, README, and a reproducibility script.</p></div> <div> </div> <div>The contribution is intended as a narrow diagnostic stress test. It does not propose a general theory of grounding or a general benchmark for language models.</p></div> |
| title | Referential Binding Stress Test: Supported Fragments Do Not Guarantee Referential Legitimacy in Small Instruction-Tuned Language Models |
| topic | Referential Binding Language Model Evaluation Instruction-Tuned Language Models Grounding Evaluation Synthetic Stress Test Binding Failure Relational Binding Compositional Binding Valid Support Set Hallucination Analysis AI Evaluation LLM Evaluation |
| url | https://doi.org/10.5281/zenodo.19944708 |