Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Lin, Hender
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2503.06648
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Standard NLP benchmarks often fail to capture vulnerabilities stemming from dataset artifacts and spurious correlations. Contrast sets address this gap by challenging models near decision boundaries but are traditionally labor-intensive to create and limited in diversity. This study leverages large language models to automate the generation of diverse contrast sets. Using the SNLI dataset, we created a 3,000-example contrast set to evaluate and improve model robustness. Fine-tuning on these contrast sets enhanced performance on systematically perturbed examples, maintained standard test accuracy, and modestly improved generalization to novel perturbations. This automated approach offers a scalable solution for evaluating and improving NLP models, addressing systematic generalization challenges, and advancing robustness in real-world applications.

Similar Items