Saved in:
Bibliographic Details
Main Authors: Xiao, Bushi, Bennie, Michael, Bardhan, Jayetri, Wang, Daisy Zhe
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.17669
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912653210812416
author Xiao, Bushi
Bennie, Michael
Bardhan, Jayetri
Wang, Daisy Zhe
author_facet Xiao, Bushi
Bennie, Michael
Bardhan, Jayetri
Wang, Daisy Zhe
contents Structural priming is a cognitive phenomenon where exposure to a particular syntactic structure increases the likelihood of producing the same structure in subsequent utterances. While humans consistently demonstrate structural priming effects across various linguistic contexts, it remains unclear whether multimodal large language models (MLLMs) exhibit similar syntactic preservation behaviors. We introduce PRISMATIC, the first multimodal structural priming dataset, which advances computational linguistics by providing a standardized benchmark for investigating syntax-vision interactions. We propose the Syntactic Preservation Index (SPI), a novel reference-free evaluation metric designed specifically to assess structural priming effects in sentence level. Using this metric, we constructed and tested models with two different multimodal encoding architectures to investigate their structural preservation capabilities. Our experimental results demonstrate that models with both encoding methods show comparable syntactic priming effects. However, only fusion-encoded models exhibit robust positive correlations between priming effects and visual similarity, suggesting a cognitive process more aligned with human psycholinguistic patterns. This work provides new insights into evaluating and understanding how syntactic information is processed in multimodal language models.
format Preprint
id arxiv_https___arxiv_org_abs_2502_17669
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Towards Human Cognition: Visual Context Guides Syntactic Priming in Fusion-Encoded Models
Xiao, Bushi
Bennie, Michael
Bardhan, Jayetri
Wang, Daisy Zhe
Computation and Language
Structural priming is a cognitive phenomenon where exposure to a particular syntactic structure increases the likelihood of producing the same structure in subsequent utterances. While humans consistently demonstrate structural priming effects across various linguistic contexts, it remains unclear whether multimodal large language models (MLLMs) exhibit similar syntactic preservation behaviors. We introduce PRISMATIC, the first multimodal structural priming dataset, which advances computational linguistics by providing a standardized benchmark for investigating syntax-vision interactions. We propose the Syntactic Preservation Index (SPI), a novel reference-free evaluation metric designed specifically to assess structural priming effects in sentence level. Using this metric, we constructed and tested models with two different multimodal encoding architectures to investigate their structural preservation capabilities. Our experimental results demonstrate that models with both encoding methods show comparable syntactic priming effects. However, only fusion-encoded models exhibit robust positive correlations between priming effects and visual similarity, suggesting a cognitive process more aligned with human psycholinguistic patterns. This work provides new insights into evaluating and understanding how syntactic information is processed in multimodal language models.
title Towards Human Cognition: Visual Context Guides Syntactic Priming in Fusion-Encoded Models
topic Computation and Language
url https://arxiv.org/abs/2502.17669