Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Dalal, Aryaman
Format:	Recurso digital
Language:
Published:	Zenodo 2026
Online Access:	https://doi.org/10.5281/zenodo.19602437
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

We propose a methodology for constructing geometric representations of specialized visual domains. The approach extends the EcoLoop synthesis framework beyond generation and evaluation to treat the domain’s embedding space as itself an object of study. Three architectural commitments distinguish the methodology from existing synthesis-and-preference-optimization pipelines. First, the primary training signal is expert-produced ranked distributions of image sets rather than pairwise preferences; listwise preference optimization, grounded in the Plackett–Luce model and the learning-to-rank tradition, is used in place of pairwise reward modeling, recovering the distance, cluster, and magnitude information that pairwise reduction discards. Second, the rank position assigned by experts is reinterpreted as a supervised scalar coordinate along which the embedding space can be introspected using the concept-direction and linear-representation tradition (TCAV, representation engineering, linear representation hypothesis) — producing a principled way to ask whether realism is a linear direction, a nonlinear manifold, or a clustered structure in a domain-adapted representation. Third, a trained general inverse model is learned from the free byproducts of the primary synthesis pipeline (image–caption pairs, prompt–generation pairs, ranked sets), enabling geometric introspection by intervention in the style of causal-abstraction analyses and concept-slider interventions: a representation is perturbed along a discovered direction, inverted back to an input, and the induced change is examined. These three commitments together produce not only a generation system but a studyable representation of the domain itself — one that supports natural extensions to video generation as latent-space trajectories, and that generalizes across specialized visual domains (medical imaging, remote sensing, microscopy, materials science) in which expert perception diverges substantially from lay perception. The methodology is formulated here as a research program; experimental validation is the subject of ongoing work.

Similar Items