Table des matières: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Song, Ruizhuo, Yuan, Beiming
Format:	Preprint
Publié:	2025
Sujets:	Computer Vision and Pattern Recognition
Accès en ligne:	https://arxiv.org/abs/2508.15387
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Table des matières:

Despite deep learning's broad success, its abstract-reasoning bottleneck persists. We tackle Raven's Progressive Matrices (RPM), the benchmark for pattern, reasoning and problem-solving intelligence. We model the full causal chain image $\rightarrow$ attributes $\rightarrow$ progressive patterns $\rightarrow$ consistency $\rightarrow$ answer and build the baseline DIO. Yet DIO's mutual-information lower-bound objective does not embed human logic: the bound is loose and statistic-based, ignoring causal subject-object links. We therefore present three refinements. 1) Brando introduces trainable negative options to tighten the variational bound. 2) WORLD replaces generation with a Gaussian-mixture feature model that supplies infinite, weighted negatives, further tightening the bound. 3) DIEGO adds metadata supervision to rectify the "attributes $\rightarrow$ patterns" semantic gap, aligning representations with human rules. These upgrades substantially boost discriminative RPM accuracy and, for the first time, let DIO generate valid answers in open-ended RPM. The work provides causal-driven design guidelines, objective-refinement strategies and cross-modal insights for abstract-reasoning research.

Documents similaires