MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autore principale:	Tavella, Danilo
Natura:	Recurso digital
Lingua:	inglese
Pubblicazione:	Zenodo 2026
Soggetti:	Referential Binding Language Model Evaluation Instruction-Tuned Language Models Grounding Evaluation Synthetic Stress Test Binding Failure Relational Binding Compositional Binding Valid Support Set Hallucination Analysis AI Evaluation LLM Evaluation
Accesso online:	https://doi.org/10.5281/zenodo.19944708
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866901308259172352
author	Tavella, Danilo
author_facet	Tavella, Danilo
contents	<div>This technical note presents a minimal stress test for referential binding in small instruction-tuned language models.</p></div> <div> </div> <div>The experiment uses a controlled synthetic dataset of 60 queries divided into three balanced families: entity binding, relational binding, and compositional binding. In all cases, the information required to answer correctly is explicitly present in the prompt. The goal is therefore not to test knowledge retrieval, but to test whether a model can select the correct referential target from a valid support set.</p></div> <div> </div> <div>Three metrics are used: Atomic Support Score (ASS), Binding Legitimacy Score (BLS), and Illegitimate Binding Rate (IBR). The results show that models can remain inside the valid support set while selecting the wrong relational or compositional target.</p></div> <div> </div> <div>The note reports results for Qwen/Qwen2.5-1.5B-Instruct and HuggingFaceTB/SmolLM2-1.7B-Instruct, with a deterministic repeat run for Qwen. The accompanying package includes result CSV files, summary tables, requirements, README, and a reproducibility script.</p></div> <div> </div> <div>The contribution is intended as a narrow diagnostic stress test. It does not propose a general theory of grounding or a general benchmark for language models.</p></div>
format	Recurso digital
id	zenodo_https___doi_org_10_5281_zenodo_19944708
institution	Zenodo
language	eng
publishDate	2026
publisher	Zenodo
record_format	zenodo
spellingShingle	Referential Binding Stress Test: Supported Fragments Do Not Guarantee Referential Legitimacy in Small Instruction-Tuned Language Models Tavella, Danilo Referential Binding Language Model Evaluation Instruction-Tuned Language Models Grounding Evaluation Synthetic Stress Test Binding Failure Relational Binding Compositional Binding Valid Support Set Hallucination Analysis AI Evaluation LLM Evaluation <div>This technical note presents a minimal stress test for referential binding in small instruction-tuned language models.</p></div> <div> </div> <div>The experiment uses a controlled synthetic dataset of 60 queries divided into three balanced families: entity binding, relational binding, and compositional binding. In all cases, the information required to answer correctly is explicitly present in the prompt. The goal is therefore not to test knowledge retrieval, but to test whether a model can select the correct referential target from a valid support set.</p></div> <div> </div> <div>Three metrics are used: Atomic Support Score (ASS), Binding Legitimacy Score (BLS), and Illegitimate Binding Rate (IBR). The results show that models can remain inside the valid support set while selecting the wrong relational or compositional target.</p></div> <div> </div> <div>The note reports results for Qwen/Qwen2.5-1.5B-Instruct and HuggingFaceTB/SmolLM2-1.7B-Instruct, with a deterministic repeat run for Qwen. The accompanying package includes result CSV files, summary tables, requirements, README, and a reproducibility script.</p></div> <div> </div> <div>The contribution is intended as a narrow diagnostic stress test. It does not propose a general theory of grounding or a general benchmark for language models.</p></div>
title	Referential Binding Stress Test: Supported Fragments Do Not Guarantee Referential Legitimacy in Small Instruction-Tuned Language Models
topic	Referential Binding Language Model Evaluation Instruction-Tuned Language Models Grounding Evaluation Synthetic Stress Test Binding Failure Relational Binding Compositional Binding Valid Support Set Hallucination Analysis AI Evaluation LLM Evaluation
url	https://doi.org/10.5281/zenodo.19944708

Documenti analoghi