Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Huang, Yucheng, Ji, Luping, Jiang, Xiangwei, Li, Wen, Ye, Mao
Format:	Preprint
Publié:	2026
Sujets:	Computer Vision and Pattern Recognition
Accès en ligne:	https://arxiv.org/abs/2603.28178
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866913046952148992
author	Huang, Yucheng Ji, Luping Jiang, Xiangwei Li, Wen Ye, Mao
author_facet	Huang, Yucheng Ji, Luping Jiang, Xiangwei Li, Wen Ye, Mao
contents	3D Scene Graph (3DSG) generation plays a pivotal role in spatial understanding and affordance perception. To mitigate generalization issues from data scarcity, joint-embedding and generative proxy tasks are proposed to pre-train 3DSG representations on predicate label-free datasets. Currently, generative pre-training usually bypasses the semantic corruption caused by the geometric augmentations in joint-embedding, but cannot avoid a negative problem ``Geometric Shortcut." In this problem, exposing dense object spatial and scale priors will induce models to trivially reconstruct scenes by interpolating object positions, rather than learning the underlying topological constraints provided by edges. To address this issue, we propose a Topological Layout Learning (ToLL) for 3DSG generation pretraining framework. In detail, we design an Anchor-Conditioned Topological Geometry Reasoning. It adopts a recurrent GNN to recover the global layout of zero-centered subgraphs (the non-visible spatial features) by one anchor with sparse spatial prior. Considering the absence of spatial layout information within the objects, it creates an information bottleneck, compelling our model to recover the full scene layout by leveraging predicate representation learning. Moreover, we construct a Structural Multi-view Augmentation to avoid semantic corruption, enhancing 3DSG representations via self-distillation. The extensive experiments on special dataset demonstrate that our ToLL could often improve 3DSG pertaining quality, outperforming state-of-the-art baselines.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_28178
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	ToLL: Topological Layout Learning with Asymmetric Cross-View Structural Distillation for 3D Scene Graph Generation Pretraining Huang, Yucheng Ji, Luping Jiang, Xiangwei Li, Wen Ye, Mao Computer Vision and Pattern Recognition 3D Scene Graph (3DSG) generation plays a pivotal role in spatial understanding and affordance perception. To mitigate generalization issues from data scarcity, joint-embedding and generative proxy tasks are proposed to pre-train 3DSG representations on predicate label-free datasets. Currently, generative pre-training usually bypasses the semantic corruption caused by the geometric augmentations in joint-embedding, but cannot avoid a negative problem ``Geometric Shortcut." In this problem, exposing dense object spatial and scale priors will induce models to trivially reconstruct scenes by interpolating object positions, rather than learning the underlying topological constraints provided by edges. To address this issue, we propose a Topological Layout Learning (ToLL) for 3DSG generation pretraining framework. In detail, we design an Anchor-Conditioned Topological Geometry Reasoning. It adopts a recurrent GNN to recover the global layout of zero-centered subgraphs (the non-visible spatial features) by one anchor with sparse spatial prior. Considering the absence of spatial layout information within the objects, it creates an information bottleneck, compelling our model to recover the full scene layout by leveraging predicate representation learning. Moreover, we construct a Structural Multi-view Augmentation to avoid semantic corruption, enhancing 3DSG representations via self-distillation. The extensive experiments on special dataset demonstrate that our ToLL could often improve 3DSG pertaining quality, outperforming state-of-the-art baselines.
title	ToLL: Topological Layout Learning with Asymmetric Cross-View Structural Distillation for 3D Scene Graph Generation Pretraining
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2603.28178

Documents similaires