Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Surkov, Maxim K., Yamshchikov, Ivan P.
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence Machine Learning 68T01, 97P80, 97C30, 68Q32 H.1.1; I.2.4; I.2.6; F.2.0
Online Access:	https://arxiv.org/abs/2402.14890
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916137913024512
author	Surkov, Maxim K. Yamshchikov, Ivan P.
author_facet	Surkov, Maxim K. Yamshchikov, Ivan P.
contents	Evaluation plays a significant role in modern natural language processing. Most modern NLP benchmarks consist of arbitrary sets of tasks that neither guarantee any generalization potential for the model once applied outside the test set nor try to minimize the resource consumption needed for model evaluation. This paper presents a theoretical instrument and a practical algorithm to calculate similarity between benchmark tasks, we call this similarity measure "Vygotsky distance". The core idea of this similarity measure is that it is based on relative performance of the "students" on a given task, rather that on the properties of the task itself. If two tasks are close to each other in terms of Vygotsky distance the models tend to have similar relative performance on them. Thus knowing Vygotsky distance between tasks one can significantly reduce the number of evaluation tasks while maintaining a high validation quality. Experiments on various benchmarks, including GLUE, SuperGLUE, CLUE, and RussianSuperGLUE, demonstrate that a vast majority of NLP benchmarks could be at least 40% smaller in terms of the tasks included. Most importantly, Vygotsky distance could also be used for the validation of new tasks thus increasing the generalization potential of the future NLP models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2402_14890
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Vygotsky Distance: Measure for Benchmark Task Similarity Surkov, Maxim K. Yamshchikov, Ivan P. Computation and Language Artificial Intelligence Machine Learning 68T01, 97P80, 97C30, 68Q32 H.1.1; I.2.4; I.2.6; F.2.0 Evaluation plays a significant role in modern natural language processing. Most modern NLP benchmarks consist of arbitrary sets of tasks that neither guarantee any generalization potential for the model once applied outside the test set nor try to minimize the resource consumption needed for model evaluation. This paper presents a theoretical instrument and a practical algorithm to calculate similarity between benchmark tasks, we call this similarity measure "Vygotsky distance". The core idea of this similarity measure is that it is based on relative performance of the "students" on a given task, rather that on the properties of the task itself. If two tasks are close to each other in terms of Vygotsky distance the models tend to have similar relative performance on them. Thus knowing Vygotsky distance between tasks one can significantly reduce the number of evaluation tasks while maintaining a high validation quality. Experiments on various benchmarks, including GLUE, SuperGLUE, CLUE, and RussianSuperGLUE, demonstrate that a vast majority of NLP benchmarks could be at least 40% smaller in terms of the tasks included. Most importantly, Vygotsky distance could also be used for the validation of new tasks thus increasing the generalization potential of the future NLP models.
title	Vygotsky Distance: Measure for Benchmark Task Similarity
topic	Computation and Language Artificial Intelligence Machine Learning 68T01, 97P80, 97C30, 68Q32 H.1.1; I.2.4; I.2.6; F.2.0
url	https://arxiv.org/abs/2402.14890

Similar Items