Saved in:
Bibliographic Details
Main Authors: Gallardo, Rodrigo, Fishman, Oz, Kyaw, Alexander Htet
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.06201
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912696036753408
author Gallardo, Rodrigo
Fishman, Oz
Kyaw, Alexander Htet
author_facet Gallardo, Rodrigo
Fishman, Oz
Kyaw, Alexander Htet
contents This paper introduces a human-in-the-loop computer vision framework that uses generative AI to propose micro-scale design interventions in public space and support more continuous, local participation. Using Grounding DINO and a curated subset of the ADE20K dataset as a proxy for the urban built environment, the system detects urban objects and builds co-occurrence embeddings that reveal common spatial configurations. From this analysis, the user receives five statistically likely complements to a chosen anchor object. A vision language model then reasons over the scene image and the selected pair to suggest a third object that completes a more complex urban tactic. The workflow keeps people in control of selection and refinement and aims to move beyond top-down master planning by grounding choices in everyday patterns and lived experience.
format Preprint
id arxiv_https___arxiv_org_abs_2511_06201
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Scene-Aware Urban Design: A Human-AI Recommendation Framework Using Co-Occurrence Embeddings and Vision-Language Models
Gallardo, Rodrigo
Fishman, Oz
Kyaw, Alexander Htet
Computer Vision and Pattern Recognition
Human-Computer Interaction
This paper introduces a human-in-the-loop computer vision framework that uses generative AI to propose micro-scale design interventions in public space and support more continuous, local participation. Using Grounding DINO and a curated subset of the ADE20K dataset as a proxy for the urban built environment, the system detects urban objects and builds co-occurrence embeddings that reveal common spatial configurations. From this analysis, the user receives five statistically likely complements to a chosen anchor object. A vision language model then reasons over the scene image and the selected pair to suggest a third object that completes a more complex urban tactic. The workflow keeps people in control of selection and refinement and aims to move beyond top-down master planning by grounding choices in everyday patterns and lived experience.
title Scene-Aware Urban Design: A Human-AI Recommendation Framework Using Co-Occurrence Embeddings and Vision-Language Models
topic Computer Vision and Pattern Recognition
Human-Computer Interaction
url https://arxiv.org/abs/2511.06201