MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Zhang, Zhaoyang, Shao, Run, Wu, Dongyue, Teng, Jiajie, Tao, Chao, Chen, Jingdong, Li, Haifeng
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2605.09352
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866917477882003456
author	Zhang, Zhaoyang Shao, Run Wu, Dongyue Teng, Jiajie Tao, Chao Chen, Jingdong Li, Haifeng
author_facet	Zhang, Zhaoyang Shao, Run Wu, Dongyue Teng, Jiajie Tao, Chao Chen, Jingdong Li, Haifeng
contents	Understanding why independently trained neural networks from different modalities converge toward shared representations, and where this convergence leads, remains an open question in representation learning. All existing evidence relies on symmetric similarity measures, which can detect convergence but are structurally blind to its direction. We introduce directional convergence analysis using cycle-kNN, an asymmetric alignment measure, applied across dozens of independently trained unimodal models spanning point clouds, vision, and language. We uncover a consistent directional asymmetry: non-language modalities move toward the neighborhood structure of language significantly more than the reverse, and this pattern holds across all model families and scales--yet is entirely invisible to symmetric measures. Mechanistic analysis traces the directionality to feature density asymmetry, whereby language representations occupy the most compact regions of representational space. The Information Bottleneck framework provides a principled interpretation: optimization under compression drives representations toward discrete, compositional structures characteristic of language. We formalize this as the Wittgensteinian Representation Hypothesis: the semantic structure of language is the asymptotic attractor of multimodal representation convergence.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_09352
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence? Zhang, Zhaoyang Shao, Run Wu, Dongyue Teng, Jiajie Tao, Chao Chen, Jingdong Li, Haifeng Artificial Intelligence Understanding why independently trained neural networks from different modalities converge toward shared representations, and where this convergence leads, remains an open question in representation learning. All existing evidence relies on symmetric similarity measures, which can detect convergence but are structurally blind to its direction. We introduce directional convergence analysis using cycle-kNN, an asymmetric alignment measure, applied across dozens of independently trained unimodal models spanning point clouds, vision, and language. We uncover a consistent directional asymmetry: non-language modalities move toward the neighborhood structure of language significantly more than the reverse, and this pattern holds across all model families and scales--yet is entirely invisible to symmetric measures. Mechanistic analysis traces the directionality to feature density asymmetry, whereby language representations occupy the most compact regions of representational space. The Information Bottleneck framework provides a principled interpretation: optimization under compression drives representations toward discrete, compositional structures characteristic of language. We formalize this as the Wittgensteinian Representation Hypothesis: the semantic structure of language is the asymptotic attractor of multimodal representation convergence.
title	The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?
topic	Artificial Intelligence
url	https://arxiv.org/abs/2605.09352

Documenti analoghi