Internformat: :: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Goel, Rajeev, Ding, Jason, Wajjala, Phani Harish, Turaga, Pavan, Gowda, Tejaswi, Garikipati, Krishna C.
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2605.18680
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

_version_	1866913142050652160
author	Goel, Rajeev Ding, Jason Wajjala, Phani Harish Turaga, Pavan Gowda, Tejaswi Garikipati, Krishna C.
author_facet	Goel, Rajeev Ding, Jason Wajjala, Phani Harish Turaga, Pavan Gowda, Tejaswi Garikipati, Krishna C.
contents	Metaverse platforms rely on creator-driven marketplaces where avatars are assembled from discrete, taxonomy-labeled 3D assets (e.g., tops, bottoms, shoes, accessories) under strict category and topology constraints. While users increasingly expect free-form text control, text-only retrieval is brittle: natural language is ambiguous with respect to platform taxonomies, metadata is often noisy or informal, and independently retrieved components can be stylistically inconsistent or geometrically incompatible. We propose \textbf{CMAG}, a concept-scaffolded retrieval and verified composition framework for marketplace avatar generation. Given a prompt, CMAG first synthesizes an intermediate 3D concept scaffold that disambiguates intent beyond text by providing global spatial and stylistic context. In parallel, a view-aware part discovery module extracts localized visual evidence via prompt decomposition and text-grounded segmentation. A prompt-conditioned taxonomy router enforces category coverage and resolves semantic-to-taxonomic mismatch, after which a hybrid category-wise retriever combines part-based fusion with a concept-residual fallback using feature suppression. Finally, an agentic vision--language model filters and re-ranks candidates across categories and drives an iterative verification loop to assemble prompt-faithful, topologically consistent avatars from catalog assets. We evaluate CMAG on diverse compositional prompts and demonstrate improved retrieval robustness and compositional correctness compared to strong baselines, highlighting the importance of 3D concept scaffolding under prompt ambiguity.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_18680
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	CMAG: Concept-Scaffolded Retrieval for Marketplace Avatar Generation Goel, Rajeev Ding, Jason Wajjala, Phani Harish Turaga, Pavan Gowda, Tejaswi Garikipati, Krishna C. Computer Vision and Pattern Recognition Metaverse platforms rely on creator-driven marketplaces where avatars are assembled from discrete, taxonomy-labeled 3D assets (e.g., tops, bottoms, shoes, accessories) under strict category and topology constraints. While users increasingly expect free-form text control, text-only retrieval is brittle: natural language is ambiguous with respect to platform taxonomies, metadata is often noisy or informal, and independently retrieved components can be stylistically inconsistent or geometrically incompatible. We propose \textbf{CMAG}, a concept-scaffolded retrieval and verified composition framework for marketplace avatar generation. Given a prompt, CMAG first synthesizes an intermediate 3D concept scaffold that disambiguates intent beyond text by providing global spatial and stylistic context. In parallel, a view-aware part discovery module extracts localized visual evidence via prompt decomposition and text-grounded segmentation. A prompt-conditioned taxonomy router enforces category coverage and resolves semantic-to-taxonomic mismatch, after which a hybrid category-wise retriever combines part-based fusion with a concept-residual fallback using feature suppression. Finally, an agentic vision--language model filters and re-ranks candidates across categories and drives an iterative verification loop to assemble prompt-faithful, topologically consistent avatars from catalog assets. We evaluate CMAG on diverse compositional prompts and demonstrate improved retrieval robustness and compositional correctness compared to strong baselines, highlighting the importance of 3D concept scaffolding under prompt ambiguity.
title	CMAG: Concept-Scaffolded Retrieval for Marketplace Avatar Generation
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2605.18680

Ähnliche Einträge