Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.06201 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866912696036753408 |
|---|---|
| author | Gallardo, Rodrigo Fishman, Oz Kyaw, Alexander Htet |
| author_facet | Gallardo, Rodrigo Fishman, Oz Kyaw, Alexander Htet |
| contents | This paper introduces a human-in-the-loop computer vision framework that uses generative AI to propose micro-scale design interventions in public space and support more continuous, local participation. Using Grounding DINO and a curated subset of the ADE20K dataset as a proxy for the urban built environment, the system detects urban objects and builds co-occurrence embeddings that reveal common spatial configurations. From this analysis, the user receives five statistically likely complements to a chosen anchor object. A vision language model then reasons over the scene image and the selected pair to suggest a third object that completes a more complex urban tactic. The workflow keeps people in control of selection and refinement and aims to move beyond top-down master planning by grounding choices in everyday patterns and lived experience. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2511_06201 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Scene-Aware Urban Design: A Human-AI Recommendation Framework Using Co-Occurrence Embeddings and Vision-Language Models Gallardo, Rodrigo Fishman, Oz Kyaw, Alexander Htet Computer Vision and Pattern Recognition Human-Computer Interaction This paper introduces a human-in-the-loop computer vision framework that uses generative AI to propose micro-scale design interventions in public space and support more continuous, local participation. Using Grounding DINO and a curated subset of the ADE20K dataset as a proxy for the urban built environment, the system detects urban objects and builds co-occurrence embeddings that reveal common spatial configurations. From this analysis, the user receives five statistically likely complements to a chosen anchor object. A vision language model then reasons over the scene image and the selected pair to suggest a third object that completes a more complex urban tactic. The workflow keeps people in control of selection and refinement and aims to move beyond top-down master planning by grounding choices in everyday patterns and lived experience. |
| title | Scene-Aware Urban Design: A Human-AI Recommendation Framework Using Co-Occurrence Embeddings and Vision-Language Models |
| topic | Computer Vision and Pattern Recognition Human-Computer Interaction |
| url | https://arxiv.org/abs/2511.06201 |